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JO ARCHIVING FILE SYSTEM FOR DATA SER 

IN A DISTRIBUTED NETWORK ENVIRONMENT 

TE(:HM6lLF)[15Lb 
The present invention relates generally to secondary storage 
15 systems/ such as disk drives, tape drives and the like, for computer 
processing systems. More particularly, the present invention relates to a 
file system for a network data server that automatically manages this long- 
tenn storage and retrieval of large voliuneis of data as part of a network 
data server aaoss miUtiple types of secondary storage media. 
20 . 

BACKGROUND OF THE INVENTION 
The use of secondary storage systems to provide for online storage 
for computer processing systems that is separate from the primary or main 
memory of the computer processing system is well known. Examples of 

25 current secondary storage systems include magnetic disk drives, optical 
disk drives, magnetic tape drives, solid state disk drives and bubble 
memories. Typically, secondary storage systems have much larger 
memory capacities than tiie prirnaiy memory of a computer processing 
system; however, tiie access to data stored on most secondary storage 

30 systems is siequential, not random, and the access rates for secondary 
storage systems can be significantly slower than the access rate for primary 
memory. As a result, individual bytes of data or characters of infortnatioh 
are usually stored in a secondary storage system as part of a larger 
collective group of data known as a file. 

35 Gienerally, files are stored in accordance with one or more 

1 predefiried exactly how the information in ihe 
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file will be stored and accessed in tiie secondary storage system. In most 
computer processing systems, the operating system program will have a 
file control program tiiat includes a group of standard routines to perform 
certain common functions witti reject to reading, writing, updating and 
5 maintaining tiie files as ttiey aie stored on the secondary storage system in 
accordance with the predefined file structure that organizes the storage of 
both control information and data information. Thus, when a user 
progratn executes one of these file functions, such as a read or write, the 
user program will actually invoke one of the standard routines in the file 

10 control program that then performs tiie actual file system function. In this 
way, the user program is insulated from the specific details of the file 
control program and the predefined file structure, and a more tranqjarent 
interface to the secondary storage ^stem is presented to the user program. 
As used within the preseiit invention, tiie term file system will refer 

^5 collectively to the file structure and file conirol progjram. Examples of 
current file systemis for the S^tem V operating system program include 
the, Unix® System V, Release 3 file system, the Berkeley BSD 4.2 file 
system and the combined Uni;dS) System V, Release 4 and Berkeley BSD 
4.3 Fast File System which is iiys current standard file system for ihe latest 

20 release of the System V operating system program. For a general 
backgroxmd on System V file systems, reference is made to Bach, M., The 
/ Design of the Unix® O perating System. (1986), Prentice Hall, Chpts. 4-5, pp. 
60-145; and Leffler, McKiiisich, Karels and Quarterman, The Design and 
Implementation of the 4.3 BSD Uni x® Operaring System. (1990), Chpt. 7, 

25 pp. 187-223. 

In a traditional computer processing system that is not networked, 
the secondary storage system is directly connected to the computer 
processor(s), and the user program uses the same procedures in the file 
system to access all files stored on the secondary storage system. In a 
30 distributed computer network environment, however, the user program 
must be able to access both local files, i.e. files stored on secondary storage 
systems directly connected to the computer processor, as weQ as remote 
files, i-Ci/file^ stored on secondary storage systems that 2ire accessed via a 
V- distribufjedxrie^ this need; to aUqw iiser, prograins to 
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access both local and remote files in a distributed computer network 
environment, certain standardized remote file access capabilities have 
been added as an additional software interface layer on top of the 
traditional file control program. Examples of. remote file interfaces for a 
5 distributed computer network environment using the System V operating 
system include: Network File System (NFS) and Remote File System 
(RFS). For a general backgroimd on remote file access in networked 
computer processing systems, reference is made tOvKochan> S., Unix® 
Networking. Chpts. 4 and 7 (1989) Hayden Books. pp> 93-132 and 203-i235. 
10 As the popularity of distributed computer networks has increased, 

the demand to store ever increasing volumes of data as remote files has 
also increased. In response to this demand, a ntunber of remote secondary 
storage systems have been developed primarily for the purpose of storing 
remote files. These secondary storage systems, known as data servers, 
15 servers or information serveris, are not coimected to an individual 
Computer like a traditional secondary storage device; rather they are 
connected to the distributed network itself. Exainples of ciurent large 
capacity data servers for a distributed computer network environment 
ilisiing the System V bperatihjg system include: the Epoch-1 
20 infiniteStorage™ Server available from Epoch Systems, Inc., 
Westbofough, Massachtisetts; the UniTree™ Virtual Disk System 
available from General Atomics/DISGOS Division> San Diego, California; 
and the Auspex NS 5000™ Network Server available from Auspex 
Systems, Inc-, Santa Qara> California. 
25 Although many network data servefs have ispecialized hardware 

that improves the performance arid capacity of the networked secondary 
^ storage system, most current network data servers for a System V-based 
network environment use the standard System V file systems to control 
the storage of remote files on the data server. As a result, these network 
30 data servers are limited in their ability to store, manipulate and access 
remote files to only those techniques and procedures that are generally 
supported by the standard file systems. Ur\fortimately, the standard file 
systems were originally desigried to store files on local secondary storage 
. ;systems; not ^remote ( sfecbh systems connected to many 



4.' 

different user nodes on a distributed computer network. Consequently, 
most data servers have modified the standard remote file interface which 
executes on the data server in order to iservice the tmique requuements of 
a remote secondary storage system operating in a distributed network 

5 environment. In the UniTree^ Virtual Disk System, for example, there 
is no file system in the data server. Instead, a program that is directly 
integra;ted with the NFS remote file interface manages the storage of data 
on the remote isecondary storage system as a stream of raw, byte-oiiented 
data; rather than as a set of files stored as blocks of data. As a result, the 

10 data stored by this system cannot be read by any other computer system 
without using the UniTree™ Virtual Disk System to recover the raw, 
byte-oriented data into standard block-oriented files. Iri addition^ it is not 
possible to nm a staridard NFS data server on the same rietwork as a 
UiuTree™ Virtual Disk System. 

15 Anpflier major drawbacks to current standard file systems is tiieir 

inability to support the ardiiving of files from online media secondary 
storage devices, such as magnetic disk drives, to removable media 
secondaiy storage devices, such as optical disk and inagnetic tape. Even 
with the tremendous amoujit of data that can be stored on current 

20 networic data servers, network systein administrators are constantly faced 
with the problem of how to most efficiently manages the secondary storage 
space on the network. One of the primary ways to free up available space 
oh the more expensive, but more quickly accessii?le, online media 
secondary storage devices is to ardiive selected remote files to less 

25 expensive, but less easily accessible, removable secondary storage devices^ • 
Most data servers either rely on individual users to manually 
perform back up of remote files or use some type of least-recaitly used 
archival algorithm whereby all remote files that have not been accessed 
for a given period of time are archived to removable secondary storage 

30 devices when the amoimt of available secondary storage space falls below 
some minimum amount. Unfortimately, neither of these techniques 
provides for an intelligent or reliable archiving of remote files. Those data 
^ servers that have indiyidu^ back up f iles usually end up 

x> V - - - intervexrtipri ^yhenever the , 
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amount of available secondary storage space falls below the required 
minimiun amount to effectively operate the network data server because 
users cannot be relied on to consistently back up and then remove inactive 
remote files. On the other hand, those data servers that automatically 

5 archive remote files that have not been accessed for a given period of time 
blindly apply ttie archiving algoxidun to all remote fUes and end up being 
unable to accommodate, for sample, onUne storage of large or very large 
remote files that mtist be quickly accessed at all times, but may have 
certain periods of inactivity. * 

10 future network data servers must also be able to accommodate^ 

continuing improvements ij\ distributed network environments and ever 
increasing User demands. Improvements in high speed networks, such as 
the new FDDI and Fibre Channel standards, will significantly increase tiie 
speed at which remote files can be transferred across a distributed network, 

15 Increasing remote data storage demands, such as tiie need to support 
multi-^media data comprised of the simultaneous storage and transfer of 
digitized voice data, video and audio/ will also significantly e)q>and die use 
of network data servers. Again, die standard file systmis are not designed 
to efficiently accommodate die isignificantly increased speed or use of 

20 distributed networks and network data servers^ will be required to 
support the visualization of multi-^media files in a; distributed network 
■ environment.'- 

l^Hfiile the currient standard file systems have been adequate for 
controlling the storiage and access to local files, it would be desirable to 

25 provide a file system that automatically manages the long-term storage 
and retrieval of large volumes of data as part of a network data server 
across multiple types of secondary storage media. It also would be 
advantageous to provide a file system for network data servers that is 
specifically designed to efficiently and reliably control the storage and 

30 access of remote files on remote secondary storage systems, and can 
provide for the flexibility to support future developments that will 
increase the speed and usage of distributed computer network 
environments. 
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SUMMARY OF THP JNVPNTION 
The pr^ent invention is an archiv ing file system t hat is specifi cally 
designed *o support tihe storage of, and access to, remote files stored on 
high speed, large capacity network data setvers. The archiving file system 

5 of the present invention automatically archives remote files across 
multiple types of secondary storage media on such network data servers 
based on a set of hierarchically selectable archival attributes that are 
selectively assigned to each remote file- The archiving file system is 
designed to accommodate both small and large remote files, as well as both 

10 small and large numbers of remote files, and to efficiently and reliably 
control the long-term storage of and acce^ to remote files stored on 
network data servers. The archiving file system is completely transparent 
to the user program and operates on remote files by providing a different 
file control program and a different file structure on the network data 
I 15 server, without the heed to modify the standard file system that is native 
to a particular operating system program executing on the user nodes or 
the standard network file interfaces executing on the distributed computer 
network environment. 

The archiving file system of the present inventiph comprises a 

20 unique archiving file structure for logically storing tiie remote files on the 
secondary storage device and a novel atrchiving file control program 
executing in the network data server that controls tiie access to the remote 
files stored according to the archiving file structure. Part of tiie archiving 
file structure is a flexible control structure that is used for storing control 

25 information about the remote files as part of an addressable control file 
tiiat has space on the data server that is d5mamically allocated in the same 
manner in which space is allocated for any other remote file. The control 
structure also stores the set of hierarchically selectable archival attributes 
and one or more archival blocks associated witii each remote file that 

30 automatically control the manner in which that remote file will be stored 
and ultimately archived, or even removed, from the network data server. 
The archiving file control program automatically manages the storage of 
r, and access to the remote files on multiple types of secondary storage media 
y'y::^'^;:r^i;--^'y that are;ipait^pf! t^ The archiving file control 
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program even aUows for direct access to remote files which have been 
archived onto a long-term randomly positionable, removable secondary 
storage device without the need to first stage the archived file onto an 
Online short-term direct access secondary storage device before the remote 
5 file can be accessed by a iiser program* 

By providmg for a set of hierarchically selectable archival attributes 
associated with each remote hie, the archiving file system allows user 
programs to specify how a remote file will be managed on a network data 
server, or to rely on default specifications for how tfie remote file will be 

10 managed on the network data server as specified by a site administrator, 
for example. Some of the file management features supported by the 
archiving file system and controlled by die archival attributes stored for 
each remote file iridude a file lifespan attribute, a file cycle attribute, and a 
file archive media attribute* The file lifespan attribute defines the length 

15 of time after which a remote file will be automatically deleted from the 
network data server. The file cycle attribute defines the number of 
versions of a remote file which will be created and maintained on the 
network data server each time a new version of the remote file is stored. 
The file archive media attribute defines the type of secondary storage 

20 media on which the data server will automatically ardiive a remote file. 

Another feature supported by the archiving file system of the 
present invention is that the online direct access secondary storage devices 
of a netwbrk^a^-serveiHanHbe-o^^ 

sets. Each storajge family set comprises a plurality of physically imique 
25 direct access storage devices titat are collectively accessed by ti\e archiving 
file system on a block-by-bldck basis irt, for example, a round robin ^hion. 
In this -way, the archiving file system of the present invention 
automatically implements softivare striping by arranging a plurality of 
blocks that comprise a remote file to be stored on the network data server 
30 such that selected ones of tiie blocks are stored in their entirety on separate 
ones of the physically unique direct access storage devices that comprise 
the storage family set. The archiving file system can also implement 
. software shadowing on a storage family set in order to selectively store a 
shadow copy of a remote file on a separate group of direct access secondary 
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storage devices. The software shadowing is accomplished by partitioiung 
the storage family set into a pair of ftorage faniily subsets, each storage 
family subset having an equal number of secondary storage devices, and 
automatically storing ti\e pluirality of blocks comprising the remote file on 
5 both pairs of storage family subsets- 
Still another feature supported by the archiving file system of the 
present invention is the improved management of cache bxiffers and data 
transfers within the network data server. UiJike current file systems 
which cache blocks of data in a cache buffer and then access fliose blocks of 
10 data xising a series of hash tables to search a link list of block entries stored 
in the cache buffer, the archiving file system of the present invention 
modifies the extent array pointer used by the file system to reflect whether 
, the block of data is presently stored in a cadie buffer. If a block of data is 
presently stored iri a cache buffer, then the archiving file system 
15 substitutes a pointer that points directly to the cadie buffer, rather than 
pointing the logical block address ort tite secondary storage device. In the 
preferred embpdiinent of tiie data server, tfie archiving file system also 
manages ttle transfer of data within the data server so as to xniiiimize the 
number of transfers of data across a common bus withm the data server. 
20 This elimiiuites tti^ need for duplicate traiisfers pf information within the 
V data server of the preferred einbodimerit, jdiereby sigiiificantly increasing 
the overall transfer speed of the data server 

A furtfier feature supported by the present invention is aii effective 
and efficient storage aillocation method and apparatus for allocating 
25 storage space on a secondary storage system for both smsdl and Isurge files 
within the same file system. At least two differoit sizes of logical storage 
allocation units are utilized; to aU<Kate storage space for fil 
secondary storage system. A first, smaller logical allocation unit is used to 
allocate space for the beginning of files imtil the size of the file passes a 
30 predefined maximum small allocation imit size. Beyond the predefined 
maximimi small allocation unit size for a file, a second, larger logical 
allocation tinit is used to allocate the remaining space necessary to store 
the file. The small and large logical allocations units are used by the file 
: c^ operating system program to map files directly 
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and indirectly to the physical storage devices in the secondary storage 
system. In response to a request to store a file of a given size, the method 
of the present invention first allocates one or more of a maximum 
number of small logical allocation units, The small logical allocation 

5 units represent a space of a first predefined size in the secondary storage 
: system in which to store the file. The small logical allocation imits are 
allocated imtil a total amount of the space represented by the small logical 
allocation units is greater than or equal to the given size of the file, or 
xmtil a total number of the first logical allocation tmits is equal to tiie 

10 majdmum number of small logicd.aUocation units. If the total mmiber of 
small logical allocation units is equal to the maidmum number of small 
logical allocation imits, then tiie present invention allocates an additional 
nimiber of large logic2J allocation imits imtil a total ampxmt of the space 
represented by the combination of the small allocation xmits and the large 

15 allocatibn units is greater than or equal to the given size of the file. The 
large logical edlocation units represent a space of a second predefined size 
in the secoridary storage system in which to store the file that is larger than 
the first predefined size of the^nall logical aUocation imits. 

A still further feature supported by the present invention is the 

20 aiitbmatic back up the control informiation of a file system for a secondary 
storage system in such a way so as to provide for a fast and reliable 
rrecbvety of the file system in the event of an unscheduled hard stop of a 
computer processing system. The file system utilizes conlxol information 
that is maintained in a cache memory of the computer processing system 

25 and a copy of the control information is periodicaUy backed up to two 
separate logical devices in the secondaty storage system. As part of each 
backup, a control stamp value unique to each iteration of the backup is 
written to a pair of unique control stamp locations on the logical devices, 
one control stamp location being written prior to tiie back up of the 

30 control information and the other control stamp location being written 
after the backup of the control information. In the event of an 
unscheduled hard stop of the computer processing system, the control 
information for the file system is quickly and accurately recovered by 
determining which of the two copies of the control information is .accurate 
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based on a comparison of the control stamp values in all four control 
stamp locations. This backup system can guarantee that tiie copies of tiie 
control information backed up to the pair of logical devices are not 
corrupted by only allowing control information on the logical devices to 
5 be updated at certain site-selectable sync points initiated by the operating 
system program, in this way, no matter when an unscheduled hard stop 
of the computer processing system occurs, at least one copy of the control 
information will not be in a transient or update process as of the time of 
the hard stop. As a result, the recovery of the file system is a relatively 
10 simple and relatively fast process involving the determination of which of 
the two copies of the control information is accurate based on a deduction 
of when the unscheduled hard stop occurred during the ongoing periodic 
backup of control informationi 

15 BRIEF DESCRIPTIQN OF THE DRAWINGS 

Figure 1 is a block diagram of a prior art file systeih in a computer 
iietwork environment. 

Figure 2 is a block diagram of the archiving file system of the 
present invention for a network data server. 
20 Figure 3 is a block diagram of the preferred «nbodiment of a 

network data iserver in which ttie present iiiventipn operates. 

. Figure 4 is a block diagram of the prior art System V standard file 
structure. ■ • • " .- ; ' 

Figure 5 is a block diagram of tiie file structure of ti^^^ 
25 system of the present invention. 

Figure 6 is a block diagram of the arrangemait of index nodes and 
disk address extent array for the preferred embodiinent of the file structure 
shown in Figure 5. 

Figure 7 is a block diagram of the file information in each index 
30 node of the preferred embodiment of the file structure shown in Figure 6. 

Figure 8 is a schematic diagram of the hierarchy used to select the 
archival file attributes shown in Figure 7 that control tiie archiving of a 
remote file; 
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Figure 9 is a schematic block diagram of the program modules and 
data structures that comprise the file control program of tfie archiving file 
, system of the present invention. 

Figure 10a and 10b are a block diagram and a flow chart, 
5 respectively, showing the prior art metiipd for managing cache buffers. 

Figures lla and lib are a block diagram and a flow chart, 
req>ectively, showing bow cache buffers are managed by the archiving file 
system of the p^eht inviention: 

Figure 12 is a flowchart showing how the archiving file system of 
Id the present invention provides for direct access to remote files on 
removable media wltiiout requiring that^e remote file be staged onto an 
online secondary storage device. 

Figure 13 is a schematic block diagram of the archiving and space 
management functions of the present invention, 
15 Figures 14a, 14b, 14c and 14d are flowcharts 6f various processes 

used by the archiving and space management functions shown in Figure 

V. .13. . r : ; , ;, 

Figure 15 is a block diagrdin of a storage family set in accordance 
with the present iriveiition. 
20 - Figures 16a, 16b, 16c, 16d, 16e and 16f are flowcharts showing how 
various file commands are implemented by the archiving Me system of 
the present inv^tion. . 

Figure 17 is a bloc!k disigram of the arrangeinent of ind&c nodes and 
disk address extent array for tiie prefened embodiment of the dual size 
25 allocation units of tiie present inventioh. 

Figure 18 is a block diagram lowing how indirect addressing is 
a:ccpmplished using the preferred embodiment of the dual size allocation 
units of the present inventi(m. 

Figure 19 is a flow chart showing how flie dual size allocation units 
30 of the present invention are allocated sequentially in. response to a file 
request. 

Figure 20 is a flow chart showing how, in the preferred 
embodiment, dual size allocation units of the present invention are 
allocated non-sequentiaUy in response to a request to allocate a file. 
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Figure 21 is an timihg diagram showing the preferred embodiment 
of the file reco^^ery mechanism of the present invention. 

Figure 22 is a overall block diagrams showing the structure of the 
preferred embodiment of the file recovery mechanism of the present 
5 inventioit. 

Figure 23 is a flow chart of the preferred embodiment of the method 
for writing the file recovery mechanism to ttie secondary storage device. 

Figiure 24 is a flow chart of the updating the control information in 
accordance with the present invention. 
10 Figure 25 is a flow chart showing the preferred embodiment of the 

method of file recovery in accordance with the present invention. 

Figure 26 is a flow d\art depictihg tiie select intact control structure 
step of Figure 25 in greater detail. 

15 DETAILED DESaOFnON OF THE PREFERRED EMBODIN^ 

Referriitg:;nowtp Figure 1> a blodc diag 
in a computer network environment is shown. A t3^ical computer 
V network envirormient will include a pltiraUty of user nodes 10, such as 
workstations, graphics terminals or personal computers, which are 

20 connected via a network 12 to one or more data servers^^ 14 and/or process 
servers 16. Each user node 10 executes an operating system program 20 
that services all of die system requests, such as input/output calls, from 
one or more user programs 22^ Nomially/ the operating system program 
20 includes a standard file control program 24 that handle all of the 

25 input/output calls to local files 26 stored on a secondaiy storage system 28 
that is directly connected to tiie user node 10. In a distributed computer 
network environment, the operating system program 20 typically includes 
a file switch mechanism 30 for allowing user programs 22 access to remote 
files 32 stored elsewhere on the network 12 via a network file interface 34. 

30 A similar network file interface 34 within the data server 14 accepts 
requests from the network 12 to access a remote file 32 stored on that data 
server 14. In those prior art systems that implement a block-based, file- 
oriented management of the secondary storage system/ the data server 14 
. > .exe^ decodes the requests 
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and uisi^s the same file switch mechanism 30 and standard file control 
program 24 to manage the storage and access of the remote files 32 on the 
secondary storage systems 36 that are part of the data server 14. 

Both the remote files 32 stored on the remote secondary storage 
5 system 36 and the local files 26 stored on the local secondary storage system 
28 are stored and accessed iri accordance with a file system tree structure, 
such as shown at 38 for th^ remote files 32 and at 29 fpi; the local files 26. 
The file system tree structure 29, 38 is organised as a^e tree witi\ a single 
root node that is root directory. Eviery noiirleaf node of the file triee 29, 38 

10 is a directory or subdirectory of files/ and files at a leaf node of the file tree 
:29, 38 are either empty directoties, regular files, or special files, in die prior 
iart standard file system, the entire contents of a particular file tree 29, 38 
miist be stored ori the same physical secondary storage device 28, 36, 
althoujg;h a single physical secondary storage device 28, 36 may provide 

15 storage for two or more file trees 29, 38. Typically, the operating system 
program 20 will issue a mount command to the standard file control 

programs 22 at a particular user node 10 to have access to all of the files 26, 
32 stored under ttiat particular file tree 29, 38. In some prior art references, 
20 the file trees 29, 38 are sometime referred to as a file system. To avoid 
confusion in the descriptibn of the preset invention, the maruter in 
which Mes and directories are orgjEmized on secondary storage systems 
from the perspective of the user program 22 will be referred to as a file 
tree, whereas Hxe file coritrol program together witti ^e file sftructures for 
25 logically orgaiuzing and accessing tiie file trees by the file control program 
vsnllberefe 

RefeiTing now to Figure 2, a blbck diagram of title archiving file 
system of the pr^ent invention for a network data server is shown. The 
user nodes 10 are prgahized identical to the prior art; however, the data 
30 iserver 14 is provided with an archiving file system (afs) control program 
40 in addition to tiie standard file control program 24. The operating 
system program 20 in the data server 14 of the present invention 
preferably iises the same type of file switch mechanism 30 as used in the 
user node 10 to switch incoming requests for remote files. 42 stored on a 
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remote file tree 44 to use the remote file interface 40, rather than the 
standard file system program 24. This allows the archiving file system of 
the present invention to operate completely transparent to the user 
program 22. Because the afs control program 40 is pro^ammed to handle 
■ 5 any ffle requests noiSQiaUymade^w 

tiiere is ho need to modify the standard file system program 24 stored at a 
iiiser node 10 on the network 12. This allows for the straightforward and 
simple installation of a data server 14 executing the arduvirig file system 
of the present invention onto a given network 12 without the need to 

10 modify or update any of thenser nodes 10 on that network 12. 

In the preferred embodiment, the archiving file system of the 
present invention is capable of supporting a number pf different types of 
secondary storage media. These different types of media devices can be 
online storage devices 46, such as magnetic disk drives, or can be 

15 removable storage devices 48, such as optical disk or magnetic tape. 
Unlike most prior art file systems, the ardiiving file systan of the present 
invention allows remote files 42 stored on removable storage devices 48 to 
be cpnsidered as part of the remote file tree 44, The archiving file system 
of the present invention also can acoess removable media 49 in the 

20 removable storage deyices 48 eiiixer indirectly as archival storage, or 
directly as secondary storage, titrougpK the xise of at: control stractvire known 
as a resource file. The ability to access removable i^^ 
secondary storage eliminates the need to stage a remote file 42 stored on a 
^I'emovable niedia 49: to an online storage device 46 before that remote file 

25 42 can be accessed by a user prpgr 

The preferred embodiment, of the archiving file system of the 
presient invention is a System V based file system tiiat presents a standard 
Unix® System V, Release 4.0, file system interface to the user nodes 10 
nmnirig the standard System V operating system program 20. The 

30 description of the preferred embodiment set forth below will describe the 
hardware and software environment of the preferred data server 14, the 
organization of the preferred file structure, including the control structure, 
r utilized by the archiving file system, and, finally, the various details of the 
afs control program 40. 
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Network Data Server 

Referring now to Figure 3; a block diagram of the preferred 
embodiment of a network data server 14 is shovrn. Although the 
5 archiving file system of the present invention will be desaibed in 
operation on Uie preferred embodiment of tiie network data server 14 
shown in Figure 1, it will be understood that the archiving file ^stem of 
the present invention is equally applicable to other types of data servers, 
such as the data server shown, for exan^le, in VS. Patent No. 5,163,131 to 
10 Rowet^. 

In the preferred embodiment, the network data server 14 is 
implemented using a number of microprocessor boards operating in a 
pipelined/ multiprocesising environment and all connected to a common 
backplane VME bus 52 for inter-processor communication wittiin the data 
15 server 14. In this configuration, one or more commimication processors 
54 or an ElherNet® port of a host processor 56 are used to interface the 
network data server 14 with the distributed computer networks, such as 
EtherNet® 12a, FDDI12b, or Fibre Channel or any other type of Transport 
Control Prbtocol/Intemet Protocol (TCP/IP) network. The host processor 
20 56 executes standaid Unix® System V, Release 4^ operating system code to 
allow the data server 14 to present ttte standard Unix® interface to tiie 
networks 12 to which the diata server 14 is cormected- One or more real^ 
time file system processors 58 execute the. afe control program 
{>xesent inveittion as described in more detail below. One or more device 
25 processors 60 are also coimected to the VME Ijus 52 to physically control 
: ttie I/O bperatiohs of the plurality of remote secondary storage devioes 46, 
48 tfiat are connected to ttie particular device processor 60 in the data 
server 14 by one or more SCSI busses 62a, 62b. 

In the preferred embodiment, each of tfie file system processors 58 is 
30 assigned one or more imique device processors 60 to control, and no 
device processor 60 is assigned to more than one file system processor 58 
in order to minimize s5mchronization and contention problems within 
the data server 14. Each of the device processors 60 contains a buffer 
memory 64 connected by direct DMA access 65 to the ports for the SCSI 
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busses 62a and 62b. The buffer memory 64 is also comiected to the VME 
bus 52 to form part of a VME global data buffer memory space accessible by 
all of the processor boards 54, 56, 58 and 60. In the preferred embodiment, 
eadi buffer merhory 64 has a unique 16 Mbytes of VME memory space, and 
5 the data server 14 may be populated with a total of fourteen device 
processors 60, each having a buffer memory 64 with 16 Mbytes of memory 
space for a total VME global memory space of 224 Mbytes for the data 
server 14. The buffer memory 64 in each device processor 60 is managed 
by the data server 14 of the present invention so ais to implement a direct 
10 DMA transfer between the buffer memory 64 and the communication 
processor 54. This eliminates the need for duplicate transfers of 
V information within the data server 14 when responding to a transfer 
request, thereby significantly increasing tiie overall transfer speed of the 
data server 14/ 

15 The pipelined, multiprocessing environment is preferred for the 

data server 14 so as to distribute tiie work load of responding to user nodes 
10 on the networks 12 that initiate requests for remote files 42- When a 
request for a remote file 42 has been received over a network 12 by a 
communicalion proceissor 54, it is partially cracked or decoded. Cracking 

20 refers to the decoding of ttie commands that make up the request for tiie 
remote file 42 so that the specified operation and file name are known to 
the data server 14, The partially cracked command is ttien passed onto the 
host processor 56 where the :craddng or decoding is to^ 
Once a remote file command has been completely cracked by the host 

25 processor 56, tfie host processor 56 passes ttiat command over the VME bus 
52 to the afe control program 40 executing in tiie file processor 58 tiiat has 
been assigned responsibility for the remote file tree 44 on which the 
requested remote file 42 is stored. 
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Network File Interface 

In the preferred embodiment, the requests for remote files 42 are 
transferred across the network 12 through the network file interface 34 as 

5 remote file cotnmands. Examples of network file interfaces 34 for a 
distributed coftiputer network environment using the System V operating 
systeni include: Network File System (NFS) and Remote File System (RFS) 
that are each complete record-based network file interfaces and the File 
Transfer Prdtocpl (FTP) which is a simple fil^rbased network file trarisf er 

10 protocol In tiie case of NFS or RFS, a. remote file command might be to 
get or put a record. In the case of FTP, a remote file command would be to 
read or write a whole file. For a general backgroimd on network file 
interfaces and-remote file cx>mmaLnds in networked computer processing 
systems, reference is again made to Kochan, S., Unix® Networking . Chpts. 

15 4 and 7 (1989) Hayden Books, pp. 93-132 and 203-235. Although the 
preferred embodiment of the data server 14 utilizes a standard network 
file interface 34, it will be tmderstood that the archiving file system of the 
present invention could be utilized equally as well with any number of 
enhanced network file interfac^^ 
'■■ 20"'-'': . • ■ ^•■ • 

Filg gw^tch Mech^ni?m 

As shown in Figures 2 and 3, the preferred embodiment of the 
present invention takes advantage of tiie resident fUe switch mechanism 
30 witltin the System V operating system program 20 to allow the host 

25 prdcessbr 56 to route a remote file coxhmand to the als control program 40 
executing in the 58/ rather than routing the remote file 

command to the standard file interface program 24, In the preferred 
embodiment, the file switch mechanism 30 is the Yhodes file switch layer 
developed by Sim Microsystems. For a more detaUed desdiption of the 

30 operation and functionality of the Vnodes file switch layer, reference is 
made to Kleiman, S., ''Vnodes: An Architecture for Multiple File System 
Types in Sim UNIX", Conference Proceedings. USENIX 1986 Summer 
Techincal Conference and Exhibition , pp. 238-246; and Sandberg, R. et al. 
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"Design and Implementation of the Sun Network File System", 
rnnfermre Prngeedings USENIX 1985. pp. 119-130. 

Although the pteferred embodiment utilizes the resident fUe switch 
m^ 30 of the particular native operating system program 20 for 

5 tiie data server 14, it will be uiuierstood that the file switch mechanism 30 
may be added td the native operating system program 20 executing in the 
host processor 56 to allow the operating system program 20 to select 
between the standard file control program 24 and the afs control program 
40 as a secondary file system. Alternatively, the standard file control 

10 program 24 of the native operating system program 20 can simply be 
replaced by tiie afs control program 40/ in which case there may be no need 
for the file switch mechanism 30 within the operating system program 20, 
although certain functions of the standard file control program 24, such as 
initieil program load and system paging, wotdd need to be replicated in the 

15 afs control program 40. 

Prior Art File Structure 

Referring now to Figure 4, a block diagram of tiie prior art file 
structure for a System V based standard file system is shown. The 
20 description set forth below provides an overall understanding of the 
System V standard file system for purppses of distinguishing this file 
system from the • archiving file system of the present invention. For a 
- detailed . e7q}lu:iation of the structure emd operatipn of the entire System V 

V»afM.rl ctenHarri fi^P gy«*PTn, w>f<»rpnr*» is again made to Baich, M.. The Design 

25 of the Unix® Operating System Program . O^t. 4 (1986), Prentice Hall, pgs. 
60r9p; and Leffler, McKusich, Tiraw»l« anH Qiiarterman. The Design and 
Implem^tation of the 43 BSD Uni?t® Operating Svstem, (1990), Chpt. 7, 
pp. 187-223. 

In tiie System V standard file system, each file tree 38 is allocated a 
30 predefined amount of space on one of the remote secondary storage 
devices 36. As previously mentioned, in the prior art System V standard 
file system, all of the storage space assigned to a given file tree 38 must be 
located on the same physical secondary storage device 36; however, more 
than one file tree 38 can be defined on a single physical secondary storage 



\Vd94/18i£34 



PCT/USM/01125 



19 

device 36. A super block 70 having a priedefined size stores certain control 
information at the begiiming of the space assigned for each file tree 38. 
Following the super block 70, a predetermined number of logical control 
. blocks 72 are allocated to store index nodes or inodes 74 for each of the files 
5 32 that are contained within the file tree 38. In the preferred embodiment, 
there are sixteen inodes 74 stored for each logical control block 72. The 
remaining space assigned to tiie file tree 38 is for data blocks 76 that will be 
assigned by the file control program 24 to files and directories in that file 
tree 38 usiiig the coritrol structures stored in botfi tfie super block 70 and 

10 the inodes 74. 

The super block 70 contains certain cbntroLinforinatibn that is used 

by the control program to manage the files and directories stored in the file 
tree 38. This control information includes a file tree size field 80, free 
block information 82, inode information 84 and a root directory pointer 86 

15 that points to an inode 74 that containing the logical address of the 
directory blocks that store tiie files and directories ii\ the root directory for 
the file tree 38. In the System V standard file system, the file control 
progpram 24 mluntains a list o/ firee blocks and an index of free blocks as 
part of the free block information 82 and tilien uses this information to 

20 a^ign (t^ untised data blttcks 76 in response to a relquest to store a new file 
or increiaise the size of an existing file. The file control program 24 also 
iiiaintains a similar list of fi«e inodes and index of free inodes as part of 
the iriode irifprmatiiMi 84 ttiat is tiised to manage tiie assignment of inodes 
74 vwtimt Aei control blocks 72. 

25 Eadi inode 74 contaiiis certedn file access irifoniiatiori 90 niecessary 

for the file control program 24 to access a file. In iadditibit to the file access 
information 90, each inode also contains a disk address octeht array 92 that 
acts as a table of contents for the file by storing the addresses of tiie logical 
disk blocks 76 which are assi^ed to this file. The file access information 

3d 90 includes: 

file owner Identifies an individual owner and i group owner 

who are always allowed to access the file. 

file type Defines whetiier the file is a regular file, a directory> or 

ia special file. 
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file size Defines the size of the file in bytes- 

file access perm Defines the read /write/execute permission for the 

owner, group owner, and all oiher users in the sys 
file access time Identifies the last time tiie file was modified, the last 
5 time the file was accessed, and the last time the inode 

70 for the file was modified. 
link field Stores the number of names the file has in the 

directory. 

The file control program 24 uses the block information 82 to assign 
10 logical disk blocks 76 to a file and then modifies the disk address extent 
array 92 in the inode 74 for that file to indicate which logical disk blocks 76 
have been assigned to that file. The file control program 24 also uses and 
maintains the other fields of the access information 90 in the inode 74 to 
manage access to that file. 
■ 15 ■ ■■ . . • 

Archiving File System Fil e Structure 

Refenring now to Figure 5, a block diagram of the archiving filie 
: system (afs) file control structure 100 of the preferred embodiment of the 
present invention is shown. In contrast to the prior art file structure 
. 20 shown in Figure 4 which preallocateS; a certain sunotint of storage for the 
super block 72 and a predefined number of the control blocks 72, the afs 
file control structure 100 only preallocates storage for a single file tree 
3uper block 102 for each file tree 44. ; All of the remmning space assigned to 
the file tree 44 is for disk blocks 104 that wUl be assigned by the afe control 
25 program 40 as data blocks for files and directories in that file tree 44, as well 
as dynaxrucally allocated control blocks for inodes 106. Unlike the prior art 
inodes 76 which are stored in preallocated space in the file tree 39, the afs 
fik control structure 100 stores/^e inode control information for each 
, remote file 42 as an addressable file, inode 1G6> having space on the 
30 secondary storage system 46, 48 that is dynamically allocated in the same 
manner in which space is allocated for any other remote file or directory 
in that file tree 44. 

The super block 102 is also vinlike the super block 70 of the prior art 
/ file system. The super block 102 contains two file tree control blocks 108-1 
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• and 108-2; arid a disk block aUocation bit map 110. The afs control program 

40 uses the disk block allocation bit map 110 to assign unxised disk blocks 
104. For exainple, the afs control program 40 might assign a particular disk 
block 104 as a data block in resjponse to a request to store a new file or 
5 increase the size of an existing file, or tiie afs control program 40 might 
assi^ a particular disk block 104 as a control block in re^onse to a request 
to open a new file. Unlike the prior art file control program 24 which uses 
a list and index of free blocks to keep track of disk blocks 76, ttie afe control 
program 40 uses the disk block allocation bit map 110 as a bit-indicator of 

10 whether the disk block 104 is assigned or available, with one bit in the disk 
block allocation bit map uniquely representing every disk block 104 
assigned to that particular file tree 44. The afis control program 40 
maintains a working copy of the disk block allocation bit map 110 in the 
private memory of the file system processor 58 and scans the bit map 110 

15 for available disk blocks 104 in a forward round-robin manner in response 
to a request to aUocate space in the file tree 44. 

The preferred embodiment of the afs control structiure 100 
maintains two separate file tree control blocks 108-1 and 108-2 as part of the 
Super block 102; Each fiie^ 

20 field 112; a file tree time stamp field 114; and a field 116 for other file tree 
attnbutes. As described in futher detail hereinafter, the afs control 
ptogram 40 alternates updating the file tree control blocks 108-1 and 108-2 
as part of an automatic backup procedure that aiables the file system to 
ensure tilie proper recoyery of all valid file transactioiis in the event of a 

25 fatdt or hardware failure on tiiie data server 14, e-g., an une>q>ected loss of 
power. 

In the archiviriig; file system of the present invention, the amoimt of 
space needed to store the inodes 106 for a file tree 44 is allocated on an as- 
needed basis. The resiilt is that if a large number of very small remote 
30 files 42 are store on a particular file tree 44, then more space will be 
allocated to stored the inodes 106 for those files. In contrast, if a small 
number of very large remote files are stored on a particular file tree, then 
very little space will need to be allocated to store the inodes 106 for those 
files. In this .way> the avoids the problems of 
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the prior art fil€ systems that have a fixed amount of space in the form of a 
predefined number of control blocks 72 reserved for storing inodes 76. Ir 
the prior art file systems, if there are not enough control blocks 72 
preassigned for storing inodes 76, then empty space in the file tree 39 
5 cannot be used when there is no more room to allocate additional inodes 
76; or, if there are too many control blocks 72 preassigned for storing 
inodes 76, then space is wasted by the control blocks 72 that are unused 
when all of the disk blocks 74 have been allocated to a few large number of 
remote files 32. 

10 in the preferred embodiment of the afs control structure 100, the 

very first disk block 104-0 assigned to a file tree 44 is defined to contain at 
least three inodes 106, the inode directory inode 106-0, the inode allocation 
bit map inode 106-1, and the root directory inode 106-2 for the file tree 44. 
Figure 6 shows the arrangement of an inode 106 in accordance with ti:\e 
15 preferred embodiment of the present invention. In this embodiment, 
each inode 106 occupies 256 bytes and ti\ere are four inodes which are 
stored in a logical disk block 104 of a size of IK bytes. Each inode 106 
contains file access information 118 and a disk addres? extent array 120. In 
addition to tiie normal file, acc^s information 90 fpund in the prior art, 
20 the file access information 118 of tiie present irivention contairis a 
hierarchically selectable set of archival attributes 140 and one or more 
archive block pomters 143 to archive blocks 144 (Figure 7) that are used by 
the afs file system to perform ardiiying of the remote files in an 
intelligent and efficient manner ttiat is selectable by tiae individual user. 
25 The disk address extent array 120 cpntjqjis block number pointers 
corresponding to the logical disk blocks 104 which have beei:i allocated for 
the particular file represented by thfe ino4e 106. Ap. additional step of 
transforming the logical block number pointers to one or more actual 
physical addresses in the form of cylinder and sector number on a disk 
30 drive, for example, is typically performed by tiie device cdntrpUer 60. 

As described in furtiier detail in cormection with the description of 
tiie buffer management module, two versions of the inode 106 are actually 
supported by the afs file structure 100, a device version of the inode 106 
^ > that is resident Ott the secondary storage device 46, 48 and a buffer version 
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of tiie inode 106 that is resident in the buffer memory 64 of the data server 

■■• .14.- 

Storage AHoc^tion 

5 In the preferred embodiment the logical block number pointers for 

the disk blocks 104 include datia addresses for direct allociatioii units 122 
arid for indirect level pointers 124. The direct allocation units 122 include 
a plurality of small allocation luiits (snuiu) 126 and a .pltindity of large 
ailocatioh xinits (Igau) 128. In tfie preferred ^embodiment/ each small 

Id ^location unit 126 is a logical disk block 104 of IK byte in size and each 
large allocation imit 128 is a logical disk block of 16K bytes in size. The 
information contained in the extent eirray 120 is stored in flie inode 106 as 
bytief addresses tiiat are right shifted 10 bits (1024) whereby the byte 
addresses delineate imits of 1024 bytes or, in this case, the small allocation 

15 imit 126- Information for any particular logic disk block 104 is stored as 
four bytes of b)rte address and one byte that defines tiie logical disk ordinal 
for implementing the storage family disk set feature described below. In 
this embodiment, the inode 106 contains direct allocation units 122 for 
sixteen small allocation uxdts 126 and eig^t large allocation tmits 128/ and 

20 the indirect level pointers 124 ihdude a plursdity of first indirect level 
pointers 130, second indirect level pointers 132 and third indirect level 
' poiiiters 134, In tiie preferted embodiment, each indirect level pointer 124 
is a large allocation xmit 128. Each first indkect level pointer 130 stores the 
data address of anodier large allocation unit disk block 105 that contains 

25 the direct allocation units 122 for the first indirect level. Each second 
indirect level pointer 132 stores the data address of a set of first indirect 
level pointers 130. Each third indirect level pointer 134 includes the data 
addr^s of a set of second indirect level pointers 132. 

Referring how to Figure 17, a file storage allocation system for the 

30 present invention includes an inode 550 and a pluraUty of logical storage 
allocation units 552. In the preferred embodiment, the inode 550 need not 
be stored on the same physical storage device as the file. In the preferred 
embodiment, four inodes 550 of 256 bytes each are stored per logical IK 
block. The inpde 550 includes a plurality of file attributes 554 and the 
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extent array 556. The file attributes 554 mclude> for example, file 
ownership infonnatioii 558, file type information 560, file access 
information 562, file size information 564 and other file attribute 
information 566. 

■ 5 In the preferred embodiment; tiie extent array 56 includes data 

addresses 568 which represent logical storage allocation units 552 as 
defined on the actual physical storage de-Wces. A further step of 
transforming the logical block data address 568 to one or more actual 
physical addresses in the form of cylinder and sector number on a disk 
10 drive, for example, is tj^icaUy performed by a device controller. It will be 
recognized, however, that the block allocation information stored in the 
extent array 556 could be either logical address information or actual 
physical address information, or everi some combination of logical and 
physical address information. 
■ 15 In the preferred embodiment, the data addresses 568 for the logical 

V storage units 552 include data addresses 568 for direct allocation units 570 
aiid for indirect level pomtejrs 572. The direct allocatioA units 570 include 
^ ; a plurality of small allocation units (smau) 574 and a plurality of large 
^.allocation units (Igau) 576. In ihe preferred etrdjpdm each small 
20 allocation imit 574 is a logical stora^ unit 552 of IK byte in ?ize arid each 
large allocation unit 576 is a logical .6tora.ge unit 552 of 16K bytes in size. In 
this embodiment, tiie direct allocation units 570 include sixteen small 
aUocation units 574 and eight large aUp^^ 

It wOa be recognizee ti^t the logpkial forage \mit 5^^^^ 
25 to-one relationship with designated storage areas of a partictdar physical 
; storage device; or there may be a one-to-multiple relationship witii 
multiple designated physical storage areas on another physical storage 
device (or devices) comprising a single logical storage unit 552. In one 
embodimertt, for example, a storage device such as a disk cylinder may be 
30 partitioned so that the desi^ated storage areas at the beginning of the 
cylinder correspond in a one-to-one relationship witii the small allocation 
units 574 and the designated storage areas in remaiiiing portion of the 
cylinder directly correspond with the large allocation units 576. Because 
/ ; 1 tiie file system accesses the data oh the storage device in terms of logical 
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- allocation units 574, 576 the advantages of the present invention will be 
realized, regardless of whether there is a one-to-one or one-to-multiple 
correspondence between size of the allocation units 574^ 576 and tiie size of 
the designated physical storage areas* 
5 In the preferred embodiment, the indirect level pointers 572 include 

a plurality of first indirect level pointers 578, second indirect level pointers 
580 and ^lird indirect level pointers 582. In the preferred, embodiment, 
each indirect level pointer 572 is a large allocation unit 576- Each first 
indirect level pointer 578 stores tiKe data address of tiie direct allocation 
10 units 570. Each second indirect level pointer 580 stores the data address of 
a set of first indirect level pointers 578. Each third indirect level pointer 
582 includes the data address of a set of second indirect level pointers 580. 

The information contained in the extent array 556 is stored in the 
inode 550 as byte addresses that are right shifted 10 bits (1024) whereby the 
15 byte addresses delineate imits of 1024 bytes or, iii this case, the small 
allocation xmit 574. Information for any particular logical storage unit 552 
is stored as 4 bytes of byte address and 1 byte that defiries the logical disk 
ordinal for implementing the disk set feature as described in greater detail 
in the previously-identified co-pending application. In this way, a total of 
20 53,276 pointers to additional storage imits 552 can be stored in a single 16K 
indirect level pointer 572, i.e. 16K/4 /5 bytes per allocation extent. In this 
embodiment, the total addressable storage space exceeds 500,000 Terabytes. 

In operation^ when a file: control processor (not shown) receives a 
request to retrieve a file, the file control processor accesses the data in the 
25 file by calculating the location of the logical allocation imit 552 on a 
partiailar storage device from the data block addresses 568 in die extent 
array 556 of the inode 550 through tiie well-known technique of byte 
o^et. Those sldUed in the art will recognize that the logical disk ordinal 
for implementing the disk set feature as described in greater detail in die 
30 previously4dentified co-pending application provides the starting point 
for calculating the appropriate byte offeet. 

Referring to Figures 18-20, when the file control processor receives a 
request to save a file, the file control processor divides the data to be 
written into logical storage units 552 and saves the file according to a 
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forward allocation iiie&od. FoUowing the forward allocation method, tiie 
file control processor allocates tiie logical storage units 552 of the file across 
the small and large allocation units 574, 576 of the direct allocation units 

570 first and tiien across the large allocation units 576 of tiKe indirect level 
5 ppihterS 572. 

Reiferring to Figure 19, the storage allocation method for 
isequentially storing a file in accordance wifli tiie present invention begins 
(step 500) by testing whether the file has been completely allocated (step 
502). If the file has been completely allocated, ti\e process ends (step 504). 
10 If the file has not been completely alloGated, the file control 

processor tests whether all the small allocation units 574 of the direct 
allocation units 570 have been allocated (step 506). If not, the file processor 
allocates one or more small allocation units 574 of tiie direct allocation 
units 570 to storing file data (step 508). 
15 If all of the small allocation units 574 of tiie direct allocation imits 

570 have been allocated, the file processor tests whether all the large 
allocation vmits 576 of the direct allocation units 570 have been allocated 
<step 510). If not, the file proc^sor allocates one or more large allocation 
; uiuts 576 of the direct aUocation units 570 to stor^ 
20 If aU of the large allocation units 576 of tiie direct aUocation units 

570 have been allocated, the file processor allocates one or more large 
allocation units 576 referenced by jm indirect level pointer 572 (step 514). 
The file processor continues tiiis allocation process (steps 592-504) until tiie 
file is completely allocated. 
25 It will be understood by tiiose skilled in tiie art that a file may be 

stored vnon-%quentially through repetitive allocation of the logical storage 
uidts 552 of a file by the file processor in response to an allocation request 
for eadi storage imit 552 based on the byte of feet. Referring to Figure 20, in 
the preferred embodiment, the storage allocation method for non- 
30 sequentially storing a file in accordance with the present invention begins 
(step 520) by inputting tiie byte offset for the logical storage imit 552 (step 
522). Next, the file processor tests whether the byte offeet is less than the 
size of a small allocation unit 574 (step 524). If the byte offset is less than 
the size of the storage capacity of the small allocation unit 574, the file 
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processor allocates a small allocation iinit (smau) 574 of the direct 
allocation units 70 to storing the file data in that logical storage unit 552 
(step 526) and stops (step 53^). 

If the byte offeet is not less than the size of a small allocation xmit 
5 574, the file processor tests whether the byte offset is less than the size of 
the storage capacity of botii the large allocation units 576 and the small 
allocation iinits 574 of Ae direct allocation units 570 (step 528). If the hyte 
offset is less tiian the capadiy of the direct allocation tmits 570, the file 
prooessor first allocates tiie small allocaticm uiuts 574 and then Ute large 

10 allocation units 576 of the direct allocation units 570 imtil sufficient ^ace 
for the logical storage unit 552 has been allocated (steps 520/ 524). Those 
: skilled in the art will riecognize that allocating the small allocation imits 74 
and then the large allocation imits 576 of the direct allocation imits 570 
favors the allocation of contiguous areas of physical storage whereby 

15 erihancing access performance and improving the utilization of storage 
resources. In addition, partitioning of the physical storage device into 
storage areas corresponding to the small allocation imits 574 and large 
allocation units 576 eiAances data transfer rates by reducing file 
fragmentation and ixiiiuinizes the repositioriing the read mechanism in 

20 each storage device. 

If the byte offset is not less than ttie capacity of the direct allocation 
imits 570, the iPUe coritrol processor aUpc^tes one or more large allocation 
units 576 referenced by an indirect level pointer 572 (step 532). As those 
. skilled in the art will understand, the file control processor allocates a 

25 suffiderit niunber of large allocation units 76 through use of the first, 
second and third level indirect pointers 578, 580/ 582 to contedn the data 
iridicated by^e b3rte offeet and then ends the allocation process (step 534). 

Rig Recovery Packup 
30 Briefly, the afs control pro-am 40 alternates updates of the file tiree 

. control information betweeri a primary and a secondary online device 46 
that store the file tree 44. In a recovery situation, the afs control program 
40 will examine the file time stamp field 114 in both of the file tree control 
blocks 108-1 and 108-2 on both the primary and secondary online device 46 
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to determine whidi file tree c^^ 

on the latest valid transaction which have occurred for that file tree 4A. 

Referring to Fig^^e 22, in the preferred embodiment of the present 
invention, the file recovery apparatus 600 includes one or more control 

5 stamp sets 602, 604, one or more control information structures 606, 608 for 
each file, one or more secondary storage devices 610, 612 and a timing 
device (not shown). Those skilled in the art will understand that the 
timing device may be any electronic clock common to computer 
processihg systems. The control sets 602, 604 each include a start control 

10 stamp 614, 618 (designated A and C respectively) and an end control stamp 
616, 620 (designated B and D respectively). In the preferred embodiment, 
the control stamps 614, 616, 618, 620 have the same value. For ease of 
reference, the secondary storage devices 610, 612 are referred to as a first 
disk 610 and a second disk 612 though those skilled in the art will 

15 recognize that a secondary storage device may be, for example, a tape drive, 
oi>tical disk drive of jukebox or hard disks. 

In operation, in the preferred embodiment, the timing device 
provides site-selectable points for coordinating the writing of updated 
control information from the memory cadie to the control information 

20 structures 606, 608 stored on the secondary storage devices 610, 612 and 
coordinates tiie release of allocated inodes and blocks. In addition to the 
sync points, the update of the control information from ttie memory cache 
to the secondary storage devices 610, 612 can also be forced by the file 
control program under certain conditions, sudi as table overflows of tiie 

25 inpde release table or the block release table, or in the event of an interrupt 
sensing loss of AC power, for example. 

Refening to Figures 21 and 23, the method for utilizing the file 
recovery apparatus 600 in accordance with the present invention begins 
once the sync point is reached (step 640). The file system first generates a 

30 control stamp value (step 641) and ttien writes control stamp value to the 
start control stamp 614 on the first disk 610 (step 642), The file system then 
writes tiie control information to the control structure 606 on the first disk 
610 (step 644) and writes the control stamp value to the end control stamp 
616 on the first disk 610 (step 646). 
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The file system then "shadows" the control set 602 and control 
information on the first disk 610 on the second disk 612 (steps 648-652) by 
ma:king another copy of the information. Specifically, the file system 
writes the control stamp value to the start control stamp 618 on the second 
5 disk 612 (step 648) and tiien wntes ihe control information to the control 
structui« 608 on the second disk 612 (step 650). The file system writes the 
control stamp value to tfie end control stamp 620 on ttie second disk 612 
■ (step 652) and ends (st^ 654). With reference to the previously identified 
co-pending application, a control set 602 of control stamps 614, 616 are 
10 described as Ffle tree Time Stamps stored in the Super Block for File Tree. 
The file system repeats this process (steps 640-654) to continually update 
the control information stored in the control structures 606, 608 on the 
disks 610, 612. 

Referring to Figtire 24, the preferred embodiment for updating the 
15 dontrol structures as shown in steps 644 and 650 of Figure 23 will be 
described. The sync point specified for the System V-based operating 
system program provides the starting point (step 658) for the updating of 
the control structures in the priefeired embodimait. The first step of the 
update process (step 660) is to merge tiie released inode numbers from the 
20 inode allocatiOh meduifusm. As described in the previously identified co- 
pending appHratioh, the aUocation mechanism for ihodes in ^^t^^ 
€«inbodimeht is an inode allocation bit map, although it wiU be understood 
th^t other ailodation nieithods sudi as table or link lists could also be used. 
The update allOcatiOh bit map is then writteh to the disk (step 662). The 
i25 rieleased disk blocks are then merged into tiie 4isk block allocation 
/ mechanic (step 664). Again, the preferred embodiment uses a disk block 

allocation bit map, but btiier allocation methods would work equally as 
well with the present invention. Ihe updated disk block allocation 
inechaiusm is written to the disk (step 666). Finally, the updated directory 
30; information and updated iiiode information is written to the disk (step 
668)^ Onde all of the control information has been written from the cache 
. to the disk, the sync point is complete (step 670) and no control 
information will be changed or written onto the disk imtil the next sync 
point. 
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By not releasing allocated inodes and blocks until the control 
information is updated at *he sync points reduces the potential conflicts 
between free and allocated inodes and blocks occurring due to corrupted 
control information. By writing the indde allocation mechanism to the 

5 disk first the preferred embodiment prevents tiie imwainted condition of 
having an inode structure point to a disk block which has already been 
released. By writing the directory information last, the preferred 
embodiment also prevents ttie unwanted condition of haying a directory 
entry that points to disk blocks or inode structures that are incorrect. 

10 Referring now to Figure 25> in the event of an unscheduled hard 

stop, the file recovery method in accordance with the present invention 
begins by identifying an intact control structure (steps 672, 674), resets the 
control information in the cache to be consistent with an intact control 
structure (step 676) and ends (step 678). Those skilled in the art will 

15 recognize that standard data recovery techniques can be used to recover 
data lost once reliable control information is identified. 

Referring to Figures 21 and 26, the identification step (step 674) of 
Figure 25 begins by testing whether the value of the start control stamp 614 
on die first disk 610 is equivalent to tine value of the end control stamp 620 

20 on the second disk 612 (steps 680, 682). If the start control stamp 614 is 
eqtiivalent to the end control stamp 620, then the control structure 606 is 
intact jand should be used (step 6$4) and the identification st^ rettuns 
(step 692); I?lefemng to Figure 21> if t^^^^ stamps 614 and 620 are 

equivalent, then tiie control infpnnatipn written to the control structure 

25 606 was not corrupted during ttie unscheduled hard stop and therefore is 
reliable. By inference, in tiiis situation, tiie imscheduled hard stop must 
have occurred sometime in period 5. 

If the start control stamp 614 is not equival^t to tlje end control 
stamp 620, then the file system tests whetitier the value of the start control 

30 stamp 614 on the first disk is equivalent to ttie value of flie end control 
stamp 616 on the first disk 610 (step 686). If the start control stamp 614 is 
equivalent to the end control stamp 616, then the control structure 606 is 
intact and should be used (step 688) and the identification step returns 
(step 692). Referring to Figure 21, if the control stamps 614 and 616 are 
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equivalent, then the control information written to the control structure 
606 was not corrupted during the unscheduled hard stop and therefore is 
reliable. By inference, in this situation, the unscheduled hard stop must 
have occurred sometime during periods 3 or 4- 

5 If the start control stamp 614 is not equivalent to the end control 

stamp 616> then ttie control information stored in the conteol structure 608 
on ttie second disk 612 and bracketed in time by the prior start control 
stamp 618' and prior &xd control stamp 620' is intact and should be used 
(step 690) and the identification step returns (step 692). Referring to Figure 

10 21, if control stamp 614 and control stamp 616 are not equivalent, the 
imscheduled hard stop must have ipccurred during period 61 and so the 
information in the control structure 606 bracketed by the start control 
stsimp 614 and end control stamp 616 has been corrupted and should not 
be used. Therefore, the intact control structure is the control structure 608' 

15 \mtten prior to 

The use of the sync points to update control information and 
generate the control stamps 614, 616, 618, 620 allows the file system to 
giBherally pinpoint the timing of the imscheduled hard stop to a particular 
period and thus more accurately and quickly determine the actual statxis of 

20 control information and data at tiie time of the imscheduled hard stop. 
Determiiung the actual status of die conto^^ structures ahd data at the time 
of thfe unscheduled hard stop eliminates the need to trace each file in the 
system through a transaction log to insure its proper linkage during 
recovery and reduces the time required to reset control information, 

25 i^ecially in large distributed network systems with many files and user 
nodes. In addition, the use of more than one secondary storage device in 
the preferred embodiment of the present invention provides redundancy 
and enharices reliability of the control information. 

30 Control of Archival Process 

Referring now to Figures 7 and 8, the hierarchically selectable 
archival file attributes and the archive block pointer of the present 
invention will be described. Figure 7 shows the preferred embodiment of 
tfie file access information 118 that is defined for each remote file 42. In 
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addition to the standard file access information 138, the afs control 
structure 100 provides for a hierarchically selectable f et of archival file 
attributes 140 and one or more archive block pointers 143 ttiat point to up 
to fouir archive blocks 144 that are associated with each remote file 42. The 
5 afs control prognun 40 uses a tutique hierardiy and resolution order for 
determining the unique set of archival file attributies 140 that will be 
selected to contirbl ttie archiving of each remote filie 42. the archival file 
attributes 140 determine how many copies of a remote file 42 will exist 
witiiln the data server 14, how long a remote file 42 is to be maintained 
10 and on what mediia the ranote file 42 is to reside. 

As part of the archival process, the afs control program 40 uses a 
' cycles attribute 141 to determine whether to create cycles of previous 
versions of a file. When a new version of a file is created the previous 
version will be saved as a cycle if cycles 141 are enabled for that file. The 
15 user may specify the niunber of cycles to be maintained and their life span 
by setting the following attributes: 
: Cyde Limit Specifies the maximum nuniber of cycles that 

can exist for the file. Once Hie limit has been 
reached die oldest easting cyde will be released 
20 V each time a new cyde is created. 

Cycle Life Span Specifies the life span or tkne to livie criteria for 

the cycles of a file. Once the life spati has been 
exceeded, the cyde is eligible for termination. 
The life span of cydes cannot be greater dian the 
25 life span of tiie file itself . In the preferred 

embodiment. Cycle Limit and Cycle Life Span 
may be set only at the Kle level of flie hierarchy. 
A second part of tiie archival proceisis is the automatic migration of 
remote files 42 from on-line storage 46 to archival media 48 for backup and 
30 data siecurity purposes. Migration may occur automatically at site-defined 
time intervals or by direct operator intervention in accordance with the 
file migration attribute 142. Up to four copies of archival media may be 
specified for each remote file 42. Whether a remote file 42 will be 
ardiived> how many archive copies will exist> on what type of removable 
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10 Media Residency 
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media 49 the archive copies will reside, how the archive copies will be 

accessed and how long the archive copies will last is determined by the 

following parameters in the file migration attribute 142: 

Life Span Specifies a life span for the remote file. The life 

span may be specified in days, weeks, months or 
years. Once tiie life span has been exceeded, flie 
remote file is eligible for termination. 
Termination will not normally occur tmless 
media space is needed. 

Specifies which media types and formats are 
acceptable for storing or arcluving the file. The 
specification can be either general (i.e. tape) or 
specific (i,e. 3480 tape). The residency 
requirements may be specified for on-line 
storage and up to four levels of archival storage. 
These criteria allow for the control of risk and 
cost associated with storage of the file. 
Specifies whether the contents of a file resident 
on optical disk can be directly accessed from the 
arc^ve medium without 

on-line storage 46. Direct access from tape is not 
•■ '■■■allowed^ •" - 
The final part of the archival process to be controlled by the file 
iliformatioh 118 is tfie archival block 144 which assigxis a set of parameters 
that identify, define ownership, define access control and specify the 
location of that unit of the archival media for that remote file 42. Th^e 
parameters for the archival block 144 include: 

Media Type Identifies the type of mediae Archival media is 

some form of tape, optical disk, or other 
permanent and transportable removable media. 
For example, tape would be identified as either 
3480or VHS. 



Birect Access 
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Volume Serial Name A machine readable name assigned to the 

media. This combined with the media type 
imiquely identifies the storage entity. 

Location Identifies the physical location of the given 

5 storage entity. This information may be used by 

automated moimting systems such as a jukebox, 
or for manual operations such as room and rack 
location. 

Access Identifies whether the access mode for the 

10 media is read, write or read /write, 

For optical removable media, the foUowing additional attributes are 
included: 

File^ID Identifies the file identifier recorded on the 

optical media. 

15 Owner JLD Identifies the owner of tiie optical media. 

Group _ID : • Identifies the group of the optical media. 

Version Identifies the version of the optical media. 

For magnetic tape removable media, one additional attribute is 

.-.■•included: ; ■ . 

20 NoRewind Indicat^vthe rewind status of the magnetic tape 

. removable media. 
Figure 8 shows the various hierarchy levels that are used by tite afs 
control program 40 for detenniiung the archiysQ tile attributes 140 that 
:\^dU be llsed to control storage and archiviiig of a particular remote file 42. 

25 In the preferred embodiment, the various leve^ File Level 145, 
Directory level 146, User Level 147, Group Level 148 and Global Level 149. 
TTie archival file attributes 140 either can be directly defined for an 
individual remote file 42, or can be defaulted in accordance with the 
hierarchy. At each level of the hierarchy, the scope of the level 

30 encompasses a larger group of remote files as the priority level increases. 
Direct association of an attribute level at given Level 147 can be made only 
to a level of greater priority, i.e., attributes at the User Level 147 can only be 
directly associated to the Group Level 148 and the Global Level 149. Ihe 
File Level 145 and Directory Level 146 attributes may be set by the user. 
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Hie User Level 147, Group Level 148 and Global Level 149 attributes are 
maintained by a system administrator. In the preferred embodiment, the 
User Level 147 and Group Level 148 attributes are maintained as separate 
files in the file tree 44 and are accessed by the afs control program 40 if a 
5 mode attribute field in the file access information 118 indicates that the 
User Level 147 or Group Level 148 attributes are to be uised for a particular 
remote file 42. The Global Level 149 attributes are maintained as data 
values specified by the system administrator within the private memory 
of die afs control program 40. 

Archiving File System File Control Program 

Referring now to Figure 9, the overall relationship between the 
principal control modules Dispatch 150, lOH 152, lOD 154 and the program 
modules 156 with the primary data structures (xmunand packets 158 and 
15 table structures 160 for the archiving file S5^tem (afs) control proglram 40 
• ^ will be described. The preferred embodiment of the afe control program 40 
■ execute in the fil^ system processors 58 and commimicates with tiie host 
■ processor 56 via the lOH (input/output host) module 152 and with the 
device processors 60 via thfe lOD (inpu^ Both 
20 the lOH module 152 and lOD module 154 have a pair of in and put buffers 
162 aitd 164, and 166 and 168 in which remote file conunaxids that were 
reiieived firbm or transmitted to the VME bus 52 ate stored^ As will be 
appreciated by a progracrUkier skilled in the art/the lOH module 152 and 
lOD module 154 have appropriate pointer^ and flags for managing the 
25 bufifers 162, 164, 166 arid 168/ and for comimmicating the commands witii 
the Dispatch module 150. 

The Dispatch modtilie 150 executes rem^ 
from the host processor 56 via tiie lOH module 152 by tising a table lookup 
procedure to access the command packets 158 which define the device 
30 level bperatioris required to complete a particular remote file command. 
Based on the command packet 158 being executed, the Dispatch module 
150 calls the program modules 156 to execute the command packet 158, 
and, if required, buUd a device level command packet which is sent to the 
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device controllo- 60 via the lOD inodule 156. The function of each of the 

program modiiles 156 is described in detail below. 

In the preferred eB^odimen^of the afe control program 40, the tabk^ 

structutes 160 are stored in the private memory of the file control 
5 processor 58 and are not part of the global VME data space. The moxmt 
table 162 contains a table of certain control information relating to each fQe 
tree 44 in the network data server 14 that is moimted by any user node 10 
Oh the network 12. The Bit Alloc Map 164 stores the in core copy of the 
disk block aUocation bit map 110. The Buffer Headers 166 contain a table of 
10 certain control information relating to each cache buffer that is defined in 
the buffer memories 64. The mode Extent Table 168 contains the in core 
copy of those disk address extent arrays 120 which are currently being 
utilized by the FS module 154. The Family Set Table 170 contains a table of 
certain control information to support the definitions of storage family 
15 sets as that feature is described below. The Storage Device Table 172 
contains a table of certain control'information ti^^^^ 
characteristics of the particular seoondaiy storage devices 46, 48 attached to 
file network data server 14. A Rdease? inode Table 174 and Release Block 
Table 176 contain listings of any inodes 106 or lo^cal blocks 104 that have 
20 been released by the af s file control program since the last syston sync 
point. This information is used as part of phe updating of the control 
information frOm tiie buff» memories 64 to the secpiidary storage devices 
as previously described, A request table 178 contains a Usting of all of the 
removable media 49 wldch have beeia defined by the system administrator 
25 as being available to be accessed by the afs file system. This information is 
used by tiie RM module 182 as a table of contents for searching for 
removable media 49. The manner in which the I^atch module 150 and 
the program modules 156 utUizes the various tables in the table structures 
160 is described in det2dl below. 
30 The program modules 156 of the preferred embodiment of the afs 

control program 40 include a buffer manager module 180 tiiat manages the 
pointers for the cache buffers defined in the buffer memories 64 to set up 
die DMA transfers across the VME bus 52 between ti\e buffer memories 64 
and the coirununication processors 54. The buffer manager module 180 is 
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accessed by four program modules 156 that are called by the Dispatch 
module 150 to process a file command and /or automatically manages the 
file: the removable media manager (RM) module 182, and the archiving 
(AR) module 184, the fUe system (FS) module 186, the input/output (lO) 
5 module 188. The manner in which each of these module operates and 
uses tfte table structures 160 is described in detail below. Two other 
program modules 156 are run periodically by tile Dispatch modiile 150 or 
respond to interrupt demands for handling the removable media an 
automated media loader (AML) module 190 and a scanner (Scan) module 

10 192. The manner in which each of these modules interact with a 
removable media resource file defined for each removable media 49 is also 
described in detail below. 

For a more detailed description of operation of the afs control 
program from the perspective of the user node 10 or the system 

15 administrator, including a listing of the commands available for the afs 
•file system that are a superset of the standard System V file system 
cbrhmaiids, lef erenice is miade to the "Operations and Reference Gtiide" for 
the Visuali?;atiQT> Fjjlg gypt^in for the Intggr^ted Pata St^t^Q^ (Feb. 1993), 
available from Larjge Storage Configurations; Inc., Minneapolis, 

20 Minnesota, the disclosure of which is hereby irioorpotated by reference. 

Buffer Manager module 

Referring itow to Figures lOa arid 10b, the manner in which the 
po6l of cache buffers of titie prior art System V-based file systems are 

25 managed will be briefly described in ord^ to compare the prior art method 
of buffer management to the way in whidi liie pool of cache buffers of tiie 
present invention are managed by the buffer manager module 180. For a 
more detailed description of the structure and management of cache 
buffers in the prior art System V-based file systems, reference is made to 

30 Bach, M., The Design of the Unix® Operating System . Chpt. 3, (1986), 
Prentice Hall, pp. 33-58. 

As shown in Figure 10a, the prior art file system uses a series of 
hash chains 200 that contain a double link list of the cache buffer pointers 
202, in combination with a circular double link list of free list 204 of 
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inactive cache buffers to manage the cache buffers. All of the cache buffers 
defined for the file system are represented by a single cache buffer pointer 
202 in the hash queues and no two cache buffers may contain the 
information from the saine disk block 76, When the prior art file system 
5 receives a request for a remote file 32, flie disk address extent array 92 in 
the inode 74 is examined to determine the device and block number 201 of 
the disk biodc 76 that is being requested (step 206 - Figure 10b). The prior 
art file systern then searches through the hash chains 200 using the device 
and block number 201 of the requested disk block 76 to deterinine if that 

10 disk block is resident in the pool of cache buffers (step 207). If a match is 
found in the hash chain search, then the information in the cache biiffer 
pointed to by tiie matching cache buffer pointer 202 is used satisfy the 
request (step 208). If no match is found, then the requested disk block 76 is 
read into the next free cache buffer pointed to by the free list 204 (step 209) 

15 and that cache buffer is used to satisfy the request (step 20 

Even though the hash chains 200 in the prior art are organized and 
searched according to a hashing algorithm that att^pt^ to distribute the 
cache buffer pointei;s 202 evisnly so as to minimize ^e impact ion system 
performance, the time spait, searching the hash diains 200 obviously 

20 increase^ ttie tiine required to, respond to a request for a disk block 76 that 
is cached. In addition> although the hash chains 200 are not supposed to 
have duplicate or incorrect cache buffer pointers, it is possible for the hash 
chains 200 to beconie corrupted and incorrectly point to the wrong 
locations iii the cache memoiry from which to get the requested 

25 iriformatiori for the cUsk block 76; 

Referring now to Figures 11a and lib, the maimer in which ttie 
buffer manager module 180 manages the cache buffers of the present 
invention will be described. In contrast to the prior art meftiod of buffer 
management, the afs file system of the present invention modifies a 

30 extent array pointer 210' in the disk address extent array 120 to reflect that 
that disk block 104 is presently stored in a cache buffer defined in the buffer 
memories 64. As with the prior art method, the buffer manager module 
180 gets an extent address pointer 210 of the requested disk block 104 from 
the in core version of the inode 106 (step 216 - Figure lib). If the disk block 
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104 pointed to by the extent array pointer 210 is presently stored in a cache 
buffer, then the buffer management module 180 uses ike modified extent 
array pointer 210' stored in the in core version of the inode 106 to point 
directly to the cache buffer (step 217). If the disk block 104 pointed to by the 

5 extent array pointier 210 is not presently stored in a cache buffer, tihen the 
buffer ixianagemoit module 180 reads the disk block 104 into a free cache 
buffer as determined £rom a free list 212 of cache bvifiters that operates like 
the prior art^iee list 202 (st^ 218) and modifies the in core version of ttie 
inode 106 to point to that cadie buffer (step 219)> rather than the device 

10 and block number of the disk block 104 as stored on the secondary storage 
devices 46, 48. 

Removable Media (RM) module 

Referring again to Figure 9, the RM module 182 manages all of the 
.15 removable media 49 for tiie network data server 14 in connection with die 
AML module 190 and the Scan module 192. The RM module 182 may be 
called by the FS module 186 or the AR module 184, depending upon 
wh^ether tiie removable media 49 is being accessed directly in tiie maimer 
• diesciibed below, or is being oised as aii archival mediae. The afe control 
20 program 40 provides the user programs 22 wifli transparent access to 
remote files 42 which are stored on removable storage media 49 (i.e., 
niagnetic tape; optical disk, tape cartridges) tfurougji tiie use of a control 
stmcture known as a lerhpvable media resource file 194. 

The removable media lesoiirce ffle 194 aUows remote files 42 stored 
25 on removable media to be truly considered as an integral part of the file 
tree structure 44. The remote files 42 stored on the removable media 49 
are accessed from the perspective of the user program 22 in the same 
maimer in which the remote files 42 stored on online devices 46 are 
accessed. The removable media resource file 194 contains access 
. 30 information tiiat identifies a specific entity of removable storage media 49. 
The access information can be identified for standard media formats in the 
appropriate ANSI standard or in non-standard formats according to 
appropiiate vendor supplied documentation. 
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At the time the identified removable media and a remote file 42 
stored thereon are to be acciessed (e.g., open time) the RM module 182 uses 
the removable media resource file 194 which has been preestabUshed for 
tiiat particular removable media 49 to provide the necessary information 
5 to facilitate moimting of the removable media 49 on a secondary storage 
device 48 that can access the removable media 49, The RM module 182 
automatically makeis the connection to tiie removable mediia resource file 
194 with the particular secondary storage device 48 on which the 
removable media 49 is mounted by using the Scan module 192 and the 

10 AML module 190. It will be noted that for management of tape files in the 
System V operating system, either a symbolic link or a shell variable is 
used to identify to the user program 22 a connection to the particular 
secondary storage device 48 on which tiie tape file has been mounted. 

In the afs control program 40 of the present invention, this 

15 connection is created only when the remote file 42 is actually opened, 
thereby eliminating the window between ihe time.the tape is requested to 
be mounted on and the time the file is actually accessed (r/w). When 
access to the remote file 42 stored on the removable media 49 is 
termiiuLted (e.gv dose time), the RM module 182 releases the particular 

20 secondary s1;orage device 48 on whidi the removable media 49 is movmted; 
however, in the preferred acnbodiment the removable media 49 remains 
physically attached to the secondary storage device 48 tp facilitate later 
access to tiie remote file 42, iintil such time as an imload command is 
issued by the AML module 190 to free up a secondary storage 48. 

25 An integ^ral task of the afs control program 40 with respect to 

removable media resource files 194 is the continual scanning of all 
r^novable media storage devices 48 assodated wifli titie file tree structure 
44 by the Scan module 192^ If a new removable media 49 has been 
mounted on one of the removable media storage devices 48, the Scan 

30 module 192 reads a label on the removable media and generates a 
removable media label record located in the Storage Device table 172 for 
that removable media 49. For robotically controlled removable media 
storage devices 48 (e.g., an optical disk Jukebox or a cartridge tape jukebox), 
the AML module 190 is responsible for scheduling the mounting of 
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requiested medium contained withih th^ The contents of 

• tile storage library are preestablished within a request table 178 in the table 
structures 160. The request table 174 is scanned for volume serial numbers 
of the removable medium 49 stored within its storage library, and if the 
5 requested volmne serial number is found, the AML module 190 will send 
instructions to Hhe robotic mechanism to remove die indicated removable 
mediiun 49 from the storage library if occupied and not active and mount 
the indicated ranovable medium 49 on an appropriate secondary storage 
device 48. The Scan module 192 th^ 'senses the presence of the requestied 

10 removable medium 49 and informs the RM module 182 which completes 

. the open request. 

The RM module 182 provides for direct access to remote files 42 
stored on removable media 49 without the need to stage the entire remote 
file onto an online secondary storage device 46. Referring now to Figure 

15 j13, in re^onse to a request to read a remiote file 42 ti\at is presently stored 
on a removable media 49 (step 222), the RM modvde 182 examines the 
direct access paraiheter of the file migration attributes 143 (Figure 7) to 
d if direct access to the remote file 42 is allowed <step 223). In tiie 

'^refeired einbodim^^ for^performance reassons direct access is only 

20 permitted for removable media 49 whidi are randomly positionable, such 
/ as optical disks. If direct access is allowed> then the RM module 182 issues 
an open coiouitahd for the volume serial ntimber as -indicated by the 
arichive block pointer 144 (Figure 7) (step 224)i Once ti\e indicated 
riomovable media 49 is opened having first been mounted, if necessary, die 

25 RM module 182 uses the removable media resource file 194 to manage 
direct access to the removable media 49 (step 225). If direct access is not 
allowed; then the RM module stages the remote file on the indicated 
removable media 49 to an online secondary storage device 46 (step 226) 
and creates and uses the normal online control structures to manages 

30 access to the remote file 42 now staged onto the oiUirie secondary storage 
device 46 (step 227). 

Archiving AR Module 
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The AR module 184 controls the multiple level heterogenous 
archiving capability of the afs file system, and is also the space manager of 
the storage space on the secondary storage devices 46, 48. Each of ttiese 
ftmrtions will be desdibed in turn- 

5 The purpose of the archiving function of the AR modtJe 184 is to 

automatically backup a remote file 42 by making an archival copy of that 
file, tiiereby insuring the integrity of that file in the event of a system crash 
or other catastrophic error. In the preferred embodiment, up to four 
different copies of a backup/archive image of a remote file 42 can be 

10 created, thereby allowing either the user or the system administrator to 
control the level of vulnerability associated with the long-term storage of a 
remote file 32. For example, a first set of remote files 42 may have media 
residency requirements that reqixire the creation of two separate optical 
disk copies of the files, whereas a second set of remote files 42 may have 

15 media residency requirements that only call for a single tape backup to be 
-^•created/.-. 

The purpose of the sp^ce management function of the AR module 
184 is to manage the available storage space on the on-line devices 46 to 
insure tiiat sufficient online storage space is available to allow the network 
20 data server 14 to function efficiently- The afs control program 40 
/ maintains two "threshold" values for available storage space on tiie on- 
line devices 46 whidi are defined by ttie system admiriistrator. When on- 
line disk space usage exceeds the high threshold ttie AR module 184 
automatically begins to purge or archive remote files 42 that are eligible for 
25 elimination or archiving in accordance wth their Werarchic selectable 
archival file attributes 140, The remote files 42 that are eligible for 
removal or archiving and have waited longest since last access will be 
eliminated or archived first. This process of removal and archiving 
continues until online disk space usage falls below the low threshold. 
30 Referring now to Figure 13, the AR module 184 uses a set of archive 

selection attributes 230 to automatically control which remote files 42 will 
be archived or removed in accordance with the hierarchically selectable 
archival file attribute 140 specified for each remote file 42. In the preferred 
embodiment, the archive selection attributes 230 are specified by the 
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10 



15 



20 



25 



30 



system administrator for automatic archiving and removal of remote files 
42 from ttie online secondary storage devices 46. The AR module 188 uses 
the archival file attributes 140 indicated by the hierarchy level 145-149 
specified for that remote file- If all selection criteria specified in the 
selection attributes 230 are met by ttie file attributes 140 and the file access 
information 138, then that remote file 42 is eligible for archiving and/or 
removal. The selection criteria for the preferred embodiment of the 
archive selection attributes 230 are as follows: 



Group 



User 



Life Span 
File Size 
Media Residetiq/ 



Archive Status 



iMSt Access Time 
Creation Time 
Cycle flag 

Cycle Life Span 
Search Path Root 
Archive Size 



Specifies a list of acceptable groups. If the file 
belongs to any one of the specified groups, it is 
eligible. 

Specifies a list of acceptable xlsers. and is similar 
to group selection. 

Specifies a range of acceptable life spians. 
Specifies a range for file size. 
Specifies media residency requirements. For 
example: all files with a first level archive 
requirement of video. 

Specifies the archive requirements. For 
example: all files with an existiJig first level 
archive which have not been archived at the 
second level. 

Spedfies a range of time since last access. 
Specifies a range of time f or creatioii. 
Specifies whether to consider cycles in the 
selection process. 

Specific a range of acceptable cycle life spans. 
Specifies the starting directory for the file search. 
Specifies a range of acceptable total archive file 
size when generating the list of files to be 
archived. Once the maximum limit for a 
targeted archival media has been reached, the 
search stops. If the minimum limit has not 
been reached the archive will not occur. 
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To accdiriplish both the archiving function and the space 
management function, the AR modules 184 calk several processes that 
use the archival file attributes 140 to determine what action to take on 
remote files 42 resident on the online secondary storage devices 46. These 

5 processes are shown in Figure 13 and Figures 14a-14d and may be initiated 
at sdieduled intervals, by the crossing of the high threshold, or by operator 
action (step 250) (Figure 14a). Figure 14a shows the Monitor 232 which 
scans the inodes 106 for all online storage media 46 and compares the 
archival selection attributes 230 to tiie file information 118 (step 251) to 
10 build three lists of files (step 252): Archivable Files 234, i.e. files which 
have not yet been archived and whose file information 118 meets the 
selection criteria established by the archival selection attributes; Releasable 
Files 236, i.e., files which have been archived or whose life spans have 
expired and whose ortline disk space may therefore be released; and 

15 Furgable Rles 238, i.e., Mes whose life spans have expired- -Depending on 
the contents of each list 234; 236 and 238, the Monitor 202 might irutiate 
any or all of the Ardiiver 240, the Releaser 242 or the Reaper 244 (steps 
253-260). 

In tiie preferred embodiiiient of tiie afe control program 40, once 
20 ; files have been archived or purged, the on-line disk space they occupy may 
be quickly released in tiie event of a large infltix of new data. However, 
remote files 42 are not typically released from on-line storage 46 imtil tiie 
space is needed (st^ 258), ttiereby maximizing ttie possibility that a 
requested ronote file 42 will still be resident on disk 46, rather tlum 
25 requiring that access be made to the removable media 49 to which the 
remote file 42 was archived. 

As shown in Figure 14b, die Archiver 240 creates copies of files for 
each remote file listed in the Archivable files 234 on the targeted archival 
media 49 for that file. The targeted archival media 49 is requested (step 
30 260) and the remote file 42 on the online secondary storage device 46 is 
coped to the removable media file 49 (step 261). Once accomplished, the in 
core inode 106 for Ihe version of the remote file 42 stored on the online 
device 46 is marked as archived {step 262) and the removable media file 49 
is closed (step 263). 
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Figure 14c shows how the Releaser 242 releases the storage space of 
the online secondary storage device 46 associated with remote files 42. The 
Releaser 242 examines the in core version of the inode 106 for each file 
listed on the Releasable files 234 (step 265). If the archival reqtiirements 
5 have been met and Hhe remote file 42 has been successfully archived to a 
removable archival media 49 (step 266)/ then the disk space is marked as 
released and the remote file 42 is considered off line (step 267). 

Finally, Figure 14d shows how the Reaper 244 elimiimtes firom ttie 
entire network data server 14 all remote files 42 that have exceeded their 

10 life span. The Reaper 244 examines tiie in core version of the inode 106 
for each remote file 42 listed on the Purgable files 238 (step 270). If the file 
life span parameter in the archival attributes 140 has been exceeded (step 
271), the Reaper 244 removes the remote file 42 from the network data 
server 14 by issuing a remove system call having root permission to 

15 remove the remote file 42 (step 272). 

File System IFS) module - 

The FS module 186 manages tiie control information for all of the 
file trees 44 that are inoiinted on and controlled by a particular instance of 
20 the file control program 40, as wdl as detenninihg the logical addresses for 
\ all rexnote files 42 stored on tiiose file trees 44. It will be recogiiized that 
when there are multiple file processors 58 within the network data server 
14, multiple instances of the file control program 40 will be executing 
simultaneously, one in each file processor ; 58. Within each afs control 
25 program 40, the FS inbdtde 186 keeps track of which file trees 44 are 
mounted for that archiving file system using information in the mount 
table 162 that is accessed by the FS module 186. 

In managing the control information for the file trees 44, tiie FS 
^ module 186 acts primarily in response to directory and inode management 
30 commands from the host processor 56. Basically, these directory and inode 
mai\agement commands are similar in fimction to tiie System V directory 
and inode management commands as described in Bach, M., The Design 
of the Unix® Operating System . (1986), Prentice Hall, Chpt. 4, pp. 60-98. A 
list of functions performed by FS module include: 
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Get inode 
Free inode 



5 Read inode 
number. 
Write inode 

Create Dir 
10 Remove Dir 

Read Dir 



15 Write Dir 
Mount 



20 



Sync 



Create a new inode 106. 

Release the inode 106 for a; remote file 42 as a 
result of removal of the file 42 corresponding to 
the inode 106. 

Read an inode 106 based on a given inode 

vUpdate an inode 106 based on a given inode 
number. 

Create a new directory file 

Release a directory file as a result of the removal 

of that directory from the file tree 44. 

Read a directory file, either as part of a 

pathname lookup, or in response to a DIR 

command from a user. 

Update a dkectory file. • 

Mount the file tree (see description in lO 
module section) 

Update control informatjioh from cache buffer to 
disk^ as described in cprpending application^ 
entitled "METHOD AND APPARATUS FOR 
FILE RECOVERY FOR SECONDARY STORAGE 
SYSTEMS" 



25 



30 



Storage Family Sfitp 

The preferred embodiment of the FS module 186 supports remote 
files 42 that can be stored on onlirie secondary storage devices 46 tiiat are 
organized as storage family sets. Unlike prior art file systems that 
restricted file trees 29, 38 to a single physical online secondary storage 
device, the afe control program 40 of the present invention can establish a 
file tree 44 which can exist on multiple physical online secondary storage 
devices 46. Figure 15 shows a block diagram depiction of a storage family 
set 300 in accordance with the preferred , embodiment of the afs control 
program 40. The storage family set 300 is tmlike prior art multiple disk 
storage devices, such as a redundant array of inexpensive disks (RAID) 
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devices, which appear to the standard file interface 24 as a single physical 
storage device. Instead, the logical allocation uiuts for assigning disk 
blocks are allowed to span multiple physical devices on a block-by-block 

basis as described below in manner that is equivalent to, but different dian, 

5 RAID level zero striping. As a result, the effective data transfer rates 
which can be sustained are directly related to the niunber of online 
Secondary storajgie devices 26 and device conteollers 46 whi<h are operating 
in paralld in tiie data server. 

Figure 13 shows a blodc diagram of a typical storage family set 300. 

10 A label sector 302 is defined on each disk drive 46 to identify the storage 
family set 300 of which that particular disk drive 46 is a member. For the 
first two disks in the family set 300, the next sector 304 is used to store the 
control information that is backed up in accordance with the sync point 
procedures previously described. As part of the label sector 302, an ordinal 

15 number 304 is assigned for that disk drive 46-0, 46-1, 46-2, 46-3, within the 
storage family set 300. The label sector 302-0, 302-1, 302-2 and 302-3 also 
contains tiie equipment topiology of the entire storage family set 300 at the 
tiitie the set 300 was created by the system administrator, or when one ot 
more additional disk drives 46 were last added to the set 300. The 

20 equipinent topology .will include a family set name 306, the number of 
disk in tixe faihily set 307# arid the family set configuration 308. 

in the preferred embodiment, ttie size of tiie small and large disk 
allocation tmits remains constant and is independent of the nuxnber of 
disk drives 46 in the storage lEamily set 300. Additional disk drives 46 can 

25 be added to a storage family set at any time; The membership ordinal 
litimber 304 of any a;dded disk drive 46 wOl be imique for that particular 
storage family set 300; The extent array 120 in the control portion of thei afs 
file structure 100 hot bnly points to a disk block 104 as a disk sector, but 
also identifies the disk block 104 by including the family set ordinal 304 as 

30 part of the extent array 120. 

The equipment topology is informational and is not required for 
usage of the storage family set 300 by the afs control program 40. It does, 
however, allow individual disk drives 46 to be moved from file tree 44 to 
file tree 44, or be reconfigured on different equipments. AH that is 



wo 94/1^ 



PCTAJSS4/0m5 



48 

required for the file tree 44 containing the storage family set 300 to be 
. mounted by the afs control program 40 i"* the presence of all members of 
the set 300. Should a member in the set bt-come non-functional (i.e. data 
stored on the disk drive 46-1 beicomes unreadable)/ recovery operations 
5 within the afs control program 40 will correct all index references to the 
defective disk drivfe 46-1 and a new disk drive 46-4 (not shown) can be 
added to the storage family set 300 in place of the defective disk drive 46-1. 
In this case, the replacement disk drive 46-4 will have the same 
membership ordinal number as the replaced disk drive. 

^"10 ■ 

Striping and Shadowing 

The FS module 186 makes use of the storage family sets 300 to 
support software striping. When a remote file 42 is created on a storage 
famUy set 300> the disk blocks 104 assigned to that remote file 42 may exist 

15 on any disk drive 46^0, 46^1, 46-2, 46-3 within the storage ^mily set 300 
associated with the file tree 44 on which the remote file 42 is to be stored. 
~~ Mocks for the remote file 42 are assigned to m storage family set 

forward-end aroimd basis (i.e., round robin). However, should a 
particular disk drive, disk drive 46-1 for example, become full, the disk 

20 block 104 to be stored will be allocated space on tiie nesrt disk drive, disk 
drive 46^2, in the roimd robin sequence.. Thus, it is not a requirement for 
striped files to exist with a rigid disk assignmraf order as is the case prior 
eqt disk striping tedmiques> such as RAID. 

In the example shown in Figure 15, four sq)arate disk d^ 

25 46-1/ 46-2 and 46-3 are defined as a storage family set 300. When a remote 
file 42 is to be stored on the storage family set 300, the afs <x>ntrol program 
40 allocates the necessary ntm^r of logical disk blcicks 104 on a roimd 
rc^in basis starting witti drive 46-0 and proceeding forward to drive 46-3. 
The afs control program 40 would allocate tiie blocks of a file having 

30 twelve total blocks such that blocks 0/ 4 and 8 are stored on disk drive 46-0; 
blocks 1/ 5, and 9 are stored disk drive 46-1; blocks 2, 6 and 10 are stored on 
disk drive 46-2; and blocks 3, 7 and 11 are stored on disk drive 46-3. For 
striped fileS/ the next disk to be allocated a disk block i& computer as 

follows: 
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1- next^disk = (current jiisk +1J modulo number jof disks 
current jdisk = nextjdisk 
if (space not available on currentjiisk) goto 1 
The afs control program 40 is also capable of automaticaUy create 
5 shadow file using the storage family set 300. Li this case a shadow file or 
second image of the remote file 42 is created in parallel witti each original 
data image being stored. The afs control program 40 41o(2ates half of tfie 
disk drives in a storage family set, disk drives 46-0 and 46-2 as shown on 
ttie left half of the dotted line for example; to store original data, and the 
10 remaining half / disk drives 46-1 and 46-3, is used to store the shadow file, 
thereby providing an automatic level of online storage redundancy. In the 
preferred embodiment, Ae disk drives 46 are interleaved with the original 
image being stored on the even nimiber disk drives 46-0 and 46-2, and the 
shadow image being stored on the odd number disk drives 46-1 and 46-3. 
15 Use of this approach allows the afs control program 40 to easily support 
concurrent striping and shadowing of remote files 42 without requiring 
t any additional software or hardware controls. It will be seen that, as long 
as disk drives 46 are added to a storage family set 300 iri pairs, the storage 
■ family set 300 can be e)q)anded at ariy time. 
20 ki tiie ^example of a storage family set 300 comprised of four disk 

drives as shown in Figure 15, drives 46-0 and 46-2 could be the primary 
storage set for storing files; and drives 46^1 arid 46-3 could be the shadow 
storage set for automatically storing tiie shadow copy of the files. For 
striped and shadow files, the next disk to be allocated a disk block is 
25 computed as follows: 

1* nextjtisk -^ {currefit^evenJLisk ^^2) modvilo number^of 
Jiisks current jeven jiisk = next jdisk 
; if (Space not available oh current jevenjiisk) goto 1 

30 2. next Jiisk - (current joddJLisk ^^2) modulo number _of 

jiisks current joddjiisk = nextjdisk 
: if (space not available on current joddjiisk) goto 2 

Input/Output (JO) module 
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The 10 module 184 performs the operations for the actual remote 
file commands, such as moimt, open, read, write, etc. Figures 16a-16f are 
flowcharts describing showing how these file commands are 
impileinented. 

5 Figure 16a shows tiie steps for a mount command 400 to a mount a 

file tree 44 on the data server 14. First, the family storage set 300 for the file 
tree 44 is configured (step 402). Then the release inode table 174 and 
release block table 176 for that file tree are created in the data structures 160 
<step 404). THe super block 102, including the disk block allocation bit map 

10 110 are read in from flie secondary storage device on which the file tree 44 
is resident (step 406). Then the first three inode files, 106-0, 106-1 and 106-2 
are read into a cache bviffer established by the buffer manager module 180 
for the control information for the file tree 44 (steps 408, 410 and 412). 
Once this control irifonnation is available to the afs file conttol program 

15 40, the file tree 44 is mounted and a response to the mount command is 
rietumed to tiie usCT (step 414). 

Figure 16b shows tiie steps for ah open file command 420 to open a 
remote fUfe 42 on the file tree 44. First, the lO module 184 looks up the 
patiuiame for tiie remote file 42 and obtains the inode 106 for remote file 

20 42 (step 422). If the inode 106 is in core akeady, for example because 
anotiier file 42 having its inode 106 in the same logical block 104 is already 
in core, then the inode 106 is marked as open. Otherwise, the lO module 
reads tite inode 106 from the secondary storage device 46 (all inodes 106 are 
maintained on the online- disk drives 46) and creates an in core version of 

25 the inode 106 in the cache buffer for the coiitrol information for tiie file 
tree 44. U the file 42 is archived (step 424), the 10 module 184 gets the 
removable media resource file 194 for the file 42 (step 426) and call the RM 
module 182 to moimt the removable media 49 (step 428). If tiie file 42 is 
not archived, a check is made to see if tiie file 42 is a resovirce file (430). If 

30 not, the remote file 42 is a regular file and no additional processing is 
necessary by the ID module 184 to open the file (432). If the remote file 42 
is a resource file/ then again the removable media resource file 194 is 
acquired (step 434) and the removable media 49 is mounted (step 436) 
before returning to the user. For any archived files, a further check is 
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made to see if direct access is allowed for the archive file (step 438). As 
previously described in the description of the RM module 182, if direct 
access is not allowed, an additional step of staging the file from a 
lemovable media storage device 46 to an online storage device 46 must be 

5 performed (step 439). 

Figure 16c shows the steps for a read command 440, = The first step is 
. to lock the communication buffers which have been assigried to service 
the remote file request (step 442). Dfepending upon how flie request is 
made (e.g., NFS or FTP), the communication >uffers may be assigned in 

10 tiie communication processor 54 or tiie host processor 56. The next step is 
to determine the actual address within tiie commimication buffer that is 
being accessed by this read command (444). For eadi block 104 that must be 
read to satisfy the read command 400, a loop is made to see if that block 104 
is already in a cache buffer and, if not a cache buffer is assigned and the 

15 block 104 is read from tfie device, imtil all blocks 104 for the read are in 
cache buffer (steps 445r449). ()nce the blocks 104 are: all in ttie cache buffer, 
the cache buffer are marked as in cache (step 450) and the, data is then 
transferred by DMA fix>m ttie cache buffer defined the buffer memory 64 to 
the commtmication buffer (step 452). Finally, titie communication buffers 

20 are uidocked (step 454) and the read command is completed (step 456). 

Figure 16d shows tiie steps for a write command 460. The first two 
steps (steps 442 and 444). A check is made to see if a partial buffer is being 
written (step 462). If so, the remaining portion of the buffer must be read 
in hpm the device before the entire buffer can be written out (st^ 464)i 

25 Once the entire buJtfer is ready to be written, the buffer is transferred by 
DMA from tiie communication buffer to the cache buffer defined in tiie 
buffer memory 64 (step 466) and the cache buffer is marked as in cache 
(step 468). A check is made to see if the write through option is set (step 
470). If not, the commimication buffers can be unlocked (step 472) before 

30 the cache buffer is written to the device (step 474). Otherwise, the cache 
buffer is written to the device (step 476) and then the commimication 
buffer is unlocked (step 478) before the write command is completed (step 
479). 
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Figure 16e shows tfie close file command 480. A check is madie to 
see if liie remote file 42 is a resource file or a direct access file (step 482). If 
not, the file is marked as dosed and the lO module returns (step 484), If 
the file is a resource file or a direct access file, the media &Le is closed (step 

5 486). An activity count for the particular removable media 49 is 
decremented (step 488). If the activity count for that removable media 49 
is zero (step 490)> then the removable media 49 can be imloaded (step 492) 
before returning (step 484). 

Figxure 16f shows ti\e dismount command 500 for dismounting a file 

10 tree 44; When an attempt is made to dismount a file tree 44, a check is 
made to see that only inodes 106-0/ 106-1 and 106-2 are the only active 
modes for that file tree 44 (step 502). If not> then a message is returned 
indicating tiiat the file tree 44 is still busy (step 504). If the file tree 44 is not 
busy, tiien ihodes 106-0, 106-1 and 106-2 are marked as inactive (step 506). 

15 To insure correctness of the control structiu:e> a sync point of the file 
system is forced (step 508) after which the file tree 44 is removed from the 
^^^^:m^ 
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, 1 A me system «u.t is part of an op^tog system prog^^jxw 
2 in a .Ustributed computer processing network having a p^uraU^^ 
Lputer processors operably connected to one or more data servers each 
:::ld'or a reniote secondary sto.^ system for sto^^^^^^ 

5 remote ffles of data information, the file system compnsmg. 

, control stnKtiue means for ead. data sen- for s^o^s^ 

* cont«>l information for each remote file steted on that da^serv^ 

8 L control structurerr^eans for eachremote melting W on *^ 

t d^serverasparto.oneormoreaddressable«mt„>lfiUsha™^ 
spaceonthesecondarystoragesystemthatisdypucally^^^ 
U Z the same manner in which space is allocated ior ».y other 

U identifying name for each remote fUe stored on that data serv« and 

]^ .^^..um^controlstructurerneansforthatremotefil. 

" program means for re^ 

8 from iTmore computer programs^ting on ^f^^^^ 

Tputer processing network to operate on an indicated one of*e 

remL files by selectively accessing the directory structur^^m^ 

remote nn» j ' H»ia server on which the 

and the control structure means for «>e data server on 

«„oteffle is stored in order foobtein access fo *e<»n^^ 
: donation and the data information for the indicted one of the 



20 
21 
22 
23 

24 temote files. 



, 2 IhefUe system Of claim ! wherein the one or more addressable 
control ffles containing the control strtictme means compnses- ^ 
extent array means for sforing an array of pomters te a 
sequence of logical blocks that is dynamicaUy defined on the data 

I^er where fte control structure means for each remote fUe rs 
Stored. 
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1 
2 



3. ^r^iles3^tem of daim 2 wherein each block in the ^^^^^ 
^^al hi... contains the control st^cture .eans fo. two 



2 Lre^rr'"^ 
eacn remote file comprises: 

■n^ fofstormg a set of hierarehical attributes 
assoaated with the remote file; and 

extent array meam fo, 3„ p^^^ ^ ^ 

«quen«of logical blocte to is dyrumucally aUo«^ 
server v,he» *e data ir^a^^ for the^mote file i. stotl 

tadud^a fflTrr"" °' ^ ' «« of the hierarchical attributes 

mdudes a file lifespan attribute that defines a length of time after which 
3 the remote file is automatically deleted from thedC se^. 

.ch.d::s~r:^zr:"'' 

^w-wiUbecteatedandnS^^rrd^^^^ 
new version of the remote lile is stored. acn time a 
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The me system of daim 4 whetein the data server m^^^^^ 
more short^teim direct access storage devices and 

Itr'^r""^^' -dia archive storage dev:c r" 

automatxcaUy ardiive remote files stored on the direct access storage 
devices as archive files storpH *k i. ■ ««-cess storage 

with seWt^ Stored on the archive storage devices in accordance 

wim selected on. 
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1 9. The file system of claim 7 further wherein at least one of the data 

2 servers has a plurality of direct access storage devices that are organized as 

3 a storage family set comprising a plurality of physically imique direct 

4 access storage devices that are collectively accessed by the program means 

5 on a block^by4)lock basis such that the program means implements 

6 software $tiiping by arranging a plurality of blocks comprising a remote 

7 file stored on the storage family set to be stored with selected ones of tiie 

8 blocks stored in their entirety on separate ones of ttie physically unique 

9 direct acc^s storage devices. 

1 10. The file system of claim 9 wherein the storage family set comprises 

2 at least two direct access storage devices and wherein the program means 

3 implements software shadowing by sekctively storing a shadow copy of a 

4 remote file by partitioning the storage family set in a pair of storage family 

5 subsets> eacih storage family subset having an equal number of secondary 

6 stoirage devices and automaticaHy storing tfie plurality of blocks 

7 comprising the remote file on both paii% of storage family subsets. 

1 Iti In a file ^stem that is part of an operating system program 

2 executing in a computer processing system that indtides a secondary 

3 storage system, a system for allocating Idgical storage uruts in the 

4 ' secondary storage system in response to a request to store a file of a given 

5 size comprising: 



6 control xhejans for storing a table of contents for the Bie 

7 identifying one or more logical storage units in which the file is 

8 . stored in the secondary storage system; 

9 first allocation means for ^locating otiie ot more of a first 

10 hiimber of first logical storage units representing a space of a first 

11 ' predefined size in the secondary storage system in which to store 

12 the file until a total amoimt of the space represented by the first 

13 logical storage imits is greater than or equal to the given size of the 

14 file or a totd number of the aUocated first lo^cal storage imits is 

15 equal to the first number of first logical storage units; and 
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16 second allocation means for allocating a second number of 

17 second logical storage imits representing a space of a second 

18 predefined size in the secondary storage system in which to store 

19 die file if tiie total number of allocated first logical storage units is 

20 equal to the first number of first logical storage units until a total 

21 atnoimt of tfie spac^ represented by the first logical storage units and 

22 the second logical storage units is greater than or eqiial to the given 

23 size of the file, 

24 wherein the second predefined size that is larger than the 

25 first predefined size- 

1 12. The system of claim 11 wherein the second number of logical 

2 storage units indudes a first set of logicd storage wits address^ 

3 pointers and a second set of logical storage imits addressed by indire^ 

4 pointers and wherein tiie second allocation means comprises: 

5 direct allocation means for allocating one or more second 

6 logical storage omits of the first set if the total number of first 

7 number of first logical storage imits is equal to the first number of 

8 first logical storage units iintil a total amount of the space 

9 represented by the first logical storage un^ 

10 logical storage imite of the first set is g^at^ 

11 givai size of the file or until the total n^ 

12 logical storage imits is equal to the number of logical storage imits 

13 of the first set; and 

14 / indirect allocation me^is for allocating one or more second 

15 logical storage units of the second set if the total number of allocated 

16 second logical storage units is equal to the numbier of logical storage 

17 units of the first set until a total amount of the space represented by 

18 the first logical storage imits and second logical storage units is 

19 greater than or equal to the given size of the file. 

1 13. In a computer processing system including a file system for storing 

2 data on a secondary storage system connected to the computer processing 

3 system/ the file system having control information that is maintained in a 
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4 cache memory, a system for backing up the control inf oraiation to the 

5 secondary storage system comprising: 

6 means for initiating a periodic backup of the control 

7 information ^om the cache memory to the secondary storage 

8 system, including means for generating a unique control stamp 

9 value for each Iteration of the periodic backup; 

10 means for backing up a first and second copy of the control 

11 information to a first and second logical storage device, respectively, 

12 in the secondary storage system in response to the meahs for 

13 initiating a periodic backup, including: 

14 means for writing the unique control stamp value to a 

15 first control stamp location on flie first and second logical 

16 storage device, respectively, prior to backing up the first and 

17 second copy of the control information; arid 

18 nieans for writing the unique control stamp value to a 

19 second control stamp location on the first and second logical 

20 storage device> re^>ectively, after backing up the first and 

21 setond copy of the control information. 

1 14. The system of didin 13 f^^ 

2 sync meians for preventing the updatinig of any of the control 

3 information i^om tiie cadie memory to Hxe secondary storage 

4 system otiier than during one of the periodic or forced badcups of 

5 ; the conbol irifonnatibh. 

1 15. The system of claim 14 Wherein the miearis for 

2 aild second copy of the control information includes: 

3 means for rrterging any inodes ti\at have been released since a 

4 previous S5mc point into an inode allocation mechanism of the 

5 control information of the file system and writing the inode 

6 allocation mechanism to the logical device; 

7 means for merging any logical blocks that have been released 

8 since the previous sync point into a logical block allocation 
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9 meGhanism of the control information of the file system and 

10 writing the block allocation mecharusm to the logical device; 

11 means for v/riting any directory files, that have been changed 

12 sinc« the previous sync point to the logical device; and 

13 meaits for writing any inode information that has been 

14 chariged since the previous sync point to the logical device. 

1 16. The system of claim 15 wherein the means for writing any directory 

2 files and the means for writing any inode information are accomplished 



3 by contemporaneously marking any blocks of the control information in 

4 the cache memory as they are updated as dirty buffers, writing the dirty 

5 buffers to the logical device at the sync point and then unmarking those 

6 buffers. 



1 17. The system of daim 13 further comprising: 

2 recovery m^eans for recovering the control informa 

3 badced up on the secondary storage system ^i^ 

4 unschediUed luurd s^^ 

5 determining which of the first and second copies of the control 

6 information is accurate and using that copy of tihe control 

7 information to recover the file system. 

1 18. The system of daim 17 wherein the recovery means determines 



2 which of tiie first and second copies of the control information is accurate 

3 according to the following conditions: 



4 if the control stamp value in the first control stamp location 

5 is equal to tihe control stamp value in the fourth control stamp 

6 location, then using the first copy of the control information for the 

7 file system on tiie first logical device to recover the file system; 

8 if the control stamp value in the first control stamp location 

9 is equal to the control stamp value in the second control stamp 

10 location, then using the first copy of tiie control information for the 



file system on the first logical device to recover the file system; 
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12 otherwise using the second copy of the control information 

1 3 for the file system on the second logical device to recover ttie file 

14 system. 
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